c. NOE & 
& 1 : 
¿ ج‎ e x 
= τ t< z 
чё, 
A я 


.. . . [LO 
+ 
* а 
1 
š 4 
А 
E 
СУ жт, 
ή 


Psychometrika 


VOLUME XXII—1957 
JANUARY-DECEMBER 


Editorial Council 


Chairman:—Haroip GULLIKSEN Managing Editor:— 

Lyte V. JONES 
Editors:—Dororuy C. ADKINS Assistant Managing Editor.— 

Ῥλυι, Horsr B. J. WINER 
r 
" Editorial Board › 
Ῥοποτηγ C. ADKINS Wn. К. Estres Freperic M. LORD 
R. L. ANDERSON Henry E. GARRETT Irvine LORGE 
T. W. ANDERSON Leo А. GOODMAN , QUINN McNEMAR 
J. B. CARROLL Bert Е. Gren’ GEORGE A. MILLER 
H. 8. CONRAD J. P. GUILFORD Wu. G. MoLLENKOPF 
C. Н. Соомвѕ HAROLD GULLIKSEN Ë LINCOLN E. Moses 
L. J. CRONBACH Paur Horst Gxonar E. NICHOLSON 
E. E. Советом Атѕтох S. HOUSEHOLDER М. W. RICHARDSON 
PauL 8. Dwyer Lrovp G. HUMPHREYS R. L. THORNDIKE 
ALLEN EDWARDS Truman L. KELLEY LEDYARD TUCKER 
Max D. ENGELHART ALBERT K. KURTZ D. F. Voraw, JR. 
τν 


PUBLISHED QUARTERLY 
By THE PSYCHOMETRIC SOCIETY 


4 AT 1407 SHERWOOD AVENUE 
or 
RICHMOND 5, VIR -S - 
салі. Research | dad aen oN 99V | 
| 3 COLLEGE € . v pap 
iz πο αν 1 
| 931100 DA! ! PATI 


uo409989H >с "upg nx 


Psychometrika 


CONTENTS 


NEW ANALYSIS OF VARIANCE FORMULAS FOR TREATING 
DATA FROM MUTUALLY PAIRED SUBJECTS . ЕЕ 
JosePH LEV AND ELAINE F. KINDER 


KON THE RANKING PROBLEM . ..... +. 


ISADORE BLUMEN 


ESTIMATION OF ERROR VARIANCES OF TEST SCORES 
JOHN A. KEATS 


THE DETAILED METHOD OF OPTIMAL REGIONS. . . . : . 
PAuL 8. DWYER 


` THE DEVELOPMENT OF HIERARCHICAL FACTOR 
SOLUTIONS. ο ο .ᾱ....----. πώ ο sc СИБ ΕΙΡ 
JOHN SCHMID AND JOHN М. Leman 


A THEORY OF PATTERN ANALYSIS FOR THE PREDICTION 
OF A QUANTITATIVE CRITERION ...:::.:: 
ARDIE Lupin AND HOBART G. OSBURN 


THE EXPECTED VARIANCE OF THE SAMPLING ERRORS 
FOR A SET OF ITEM-CRITERION CORRELATIONS 
Hunrmr E. BRoGDEN 


A NECESSARY AND SUFFICIENT FORMULA FOR MATRIC 
FACTORINGA e. ALL. τω So ο ο... 


Louis GUTTMAN 


EXACT PROBABILITIES FOR CONTINGENCY TABLES 
USING BINOMIAL COEFFICIENTS ..... 9 
JAMES M. Saxopa AND Burton H. COHEN 


A STOCHASTIC MODEL FOR КОТЕ SERIAL LEARNING . .- 


RICHARD C. ATKINSON 


A COMPUTATIONAL PROCEDURE FOR TAU CORRELATION 
DESMOND S. CARTWRIGHT 


ITERATIVE INVERSE FACTOR ANALYSIS—A RAPID 


METHOD FOR CLUSTERING PERSONS .... °° 
BERNARD M. Bass 


53 


63 


- 
e 


VOLUME TWENTY-TWO MARCH 1957, NUMBER ! 


A | earch | 
| B ‘VIO H. йб COL: EGE 
i Dated 
| y mM 
h | | dices. No... a 


SOUTHERN REGIONAL GRADUATE SUMMER SESSIONS IN STATISTICS 


The fourth Southern Regional Graduate Summer Session in Statistics will be held 
from June 12 through July 20 at the Virginia Polytechnic Institute, Blacksburg, Virginia. 

The session lasts six weeks and each course offered carries approximately five quarter 
hours of graduate credit. The summer work in statistics may be applied towards residence 
requirements at any one of the cooperating institutions, as well as certain other institution? 
in partial fulfillment of residence requirements for graduate degrees. h 

The faculty for the 1957 session will include E. J. Williams, D. B. DeLury, J. L. y 
as well as the following staff members from the Virginia Polytechnic Institute: W. O. Ls. 
L. 8. Brenna, R. A. Bradley, C. W. Clunies-Ross, J. E. Freund, R. J. Freund, B. Hars 
barger, C. Y. Kramer, and R. L. Wine. m. 

Of particular interest will be the lectures by D. B. DeLury on the Sampling of ues 
logical Populations. Other courses to be offered include Analysis of Variance, Rank Orde 
Statistics, Stochastic Processes, Probability, Statistical Inference, Theory of Least πι 
Statistical Methods, Engineering Statisties, and Sampling. Seminars, which will include 
many of the foremost statisticians in the eastern part of the United States, will be con- 
ducted each afternoon Monday through Thursday from 3:00 to 4:30. These seminars 
will be on some of the more recent research in the field of statistics. r 

The total tuition fee will be $38.00 for the six-week term. Doctoral courtesy will be 
offered to those holding doctoral degrees. Living and other expenses at the Virginia ш 5 
technie Institute are reasonable. The Virginia Polytechnic Institute is located on the 


scenic Alleghany Mountain plateau 2100 feet above sea level. The summer climate 18 
delightful. 


Inquiries shoud be addressed to Boyd Harshbarger, Head, Department of Statistics, Virginia 
Polytechnic Institute, Blacksburg, Virginia. 


Psychometrika 


CONTENTS 


A SIGNIFICANCE TEST FOR THE HYPOTHESIS THAT 
TWO VARIABLES MEASURE THE SAME TRAIT 


EXCEPT FOR ERRORS OF MEASUREMENT .. - : 207 
Freprric M. LORD 
MARKOV PROCESSES IN LEARNING THEORY .....- 221 


Јонх G. Kemeny AND J. LAURIE SNELL 
NOTE ON THE LEAST SQUARES SOLUTION FOR THE 
METHOD OF SUCCESSIVE CATEGORIES . . .... 231 
R. DannELL Bock 
COMMUNALITY OF A VARIABLE: FORMULATION BY 
CLUSTER ANALYSIS s samer osut x5 p. 241 
ROBERT C. TRYON 
AMOUNTS OF FIXATION AND DISCOVERY IN MAZE 
LEARNING BEHAVIOR « è + s z < w + x Е 261 
HERBERT A. SIMON 
A MEASURE OF COHERENCE FOR HUMAN INFORMA- 
TION EIHEERS « « 4 i z @ w š se o αλ ο Ge αν 269 
Ricuanp Н. WILCOX 
A CORRELATIONAL ANALYSIS OF TRACKING BEHAVIOR 275 
W. B. Kxowrzs, J. G. HOLLAND, AND Е. P. NEWLIN 
A MEASURE OF THE GAMBLING RESPONSE-SET IN 
OBJECTIVE TESTS < s w = o vinor ж к Ещ CS 289 
Rosert C. ZILLER 
THE UPPER AND LOWER TWENTY-SEVEN PER CENT 
ИШЕН, s < w wz mox woe з 0 DOS Q E š ENSE ONE 293 
EDWARD E. CunETON 


J. RAYMOND GERBERICH, Specimen Objective Test Items—A Guide to 


Achievement Test Construction . . ... í ee s t t V V ° 297 
A review by ROBERT L. EBEL 
Sipney SIEGEL, Nonparametric Statistics for the Behavioral Sciences 298 
А review by CHARLES M. SOLLEY 
Boss, R. C. gr AL., Tables of Partially Balanced Designs with T'wo Asso- " 
3 


ΑΡ ο # = = э ЛЕ ШАШ ДАШЫ 


A review by 8лмпвт, B. LypgRBry: 


¦ БЕ 
I Aces. No... 


—— e 


VOLUME TWENTY-TWO 


Psychometrika 


CONTENTS 


NEW PROBLEMS FOR OLD SOLUTIONS 
Повевт E. BROGDEN 


OPTIMAL TEST LENGTH FOR MULTIPLE PREDICTION: 


STIMULUS AND RESPONSE GENERALIZATION: А STO- 
CHASTIC MODEL RELATING GENERALIZATION TO 
DISTANCE IN PSYCHOLOGICAL SPACE ...... 

Косвв N. SHEPARD 


THE RELATIONSHIP BETWEEN FACTORIAL COMPOSITION 
OF TEST ITEMS AND MEASURES OF TEST RELI- 


THE USE OF CONFIGURAL ANALYSIS FOR THE EVALUA- 
TION OF TEST SCORING METHODS . .....:: 
H. G. OsnunN AND AnprE LUBIN 
A MODEL FOR RESPONSE TENDENCY COMBINATION . . 
Davin Brncu 


PROCEDURES FOR OBTAINING SEPARATE SET AND CON- 
TENT COMPONENTS OF A TEST SCORE. ..... 
GERALD C. HELMSTADTER 


ITEM SELECTION METHODS FOR INCREASING TEST 
HOMOGENEITY... . 
HAROLD WEBSTER 


REPORT OF THE TREASURER 
INDEX FOR VOLUME 22 


325 


VOLUME TWENTY-TWO DECEMBER 1957 NUMBER 4 


Psychometrika 


CONTENTS 


THEORY OF LEARNING WITH CONSTANT, VARIABLE, OR 
CONTINGENT PROBABILITIES OF REINFORCEMENT 
W. K. EsTES 


A COMPONENT MODEL FOR STIMULUS VARIABLES IN 
DISCRIMINATION LEARNING 
C. J. Burke лхр W. К. Estes 


SIMPLE PROOFS OF RELATIONS BETWEEN THE COMMUN- 
ALITY PROBLEM AND MULTIPLE CORRELATION . 
Lours GUTTMAN 


A GENERAL LEAST SQUARES SOLUTION FOR SUCCESSIVE 
ΠΠ» κ. κ. e sum me ye s ee RES BS 


G. W. DIBDERICH, 8. J. Messick, AND L. R TUCKER 


ON THE APPLICATIONS OF THE METHOD OF ABSOLUTE 
SCALING 


CHUNG-TEH FAN 


A METHOD OF SCORE CONVERSION THROUGH ITEM 
με μι μι eu o v + s se wa e ERROR S ED 


FRANCES SWINEFORD AND Снохс-Тен FAN 


A REVISED LAW OF COMPARATIVE JUDGMENT ....- 
WILLIAM P. HARRIS 


A FAST APPROXIMATE ALGEBRAIC FACTOR ROTATION 
METHOD TO MAXIMIZE AGREEMENT BETWEEN 
LOADINGS AND PREDETERMINED WEIGHTS . . . 


Davin A. RODGERS 


113 


133 


147 


159 


185 


189 


199 


 υ-----------------‏ پڪ 


VOLUME TWENTY-TWO  JUNE,1957... ποδΙΜΒΕΕ 2 
i ADU ‚б Gal ο I 
led Ë 


PSYCHOMETRIKA—VOL. 22, NO. 1 
MARCH, 1957 


NEW ANALYSIS OF VARIANCE FORMULAS FOR TREATING 
DATA FROM MUTUALLY PAIRED SUBJECTS* 


JosePH Ley} AND ELAINE F. KINDER} 


YERKES LABORATORIES OF PRIMATE BIOLOGY, ORANGE PARK, FLORIDA 


. 

'The experimental design considered in this paper is one in which each 
of a group of several subjects is observed in the presence of each of the other 
subjects of the group, the entire set of possible pairings being repeated or 
replicated on several occasions. Analysis of variance formulas are described 
for this somewhat unusual design. Both the model of constants and mixed 
model are considered. Reliability formulas growing out of the analysis of 
variance calculations are developed. 


This paper describes new formulas in the analysis of variance. The 
novelty arises from the experimental design in which observations were 
made on pairs of chimpanzees of a group, each animal being paired with 
every other one. The formulas of analysis of variance usually given in text- 
books are not appropriate in this situation because of the incompleteness of 
the layout. An animal cannot be paired with itself; consequently cells which 
correspond to this pairing are blank. 

In the study for which this analysis of variance was developed, observa- 
tions on all pairs were repeated five times at irregular intervals over à period 
of five years. Because of these repetitions it was possible to study interactions 
between pairs of animals as well as other interactions. 

An analysis of variance for a design involving mutual pairings of a group 
of individuals when the observations are made once only is described by 
Quenouille [8, p. 256]. The analysis described by Quenouille does not consider 
repetitions or interactions, Although the analysis described in this paper . 
Was developed for an experiment dealing with chimpanzees, it is clearly 
applicable to experiments of the same design. 

Each score in this study is a measure of the activity of a chimpanzee 
observed for a session of 20 minutes. During the session the chimpanzee was 
alone in a cage sufficiently large to permit free movement. The cage contained 
several items of equipment, such as tivo shelves, one high and one low, & 
strap suspended from the ceiling on which the animal could swing, and some 
toys. A second animal was placed in a small adjoining cage separated о. 
the first cage only by a grating so that the two animals were in full view ο 
each other. " 

‚ "This work was supported in part by grants M-627; M-627C from the National 
Institutes of Health, Public Health Service. 


{New York State Department of Civil Service. 
New York State Department of Mental Hygiene. 


1 


2 PSYCHOMETRIKA 


A continuous second by second record was kept of the behavior move- 
ments of the first animal, whom we shall call the protagonist. No comparable 
record was kept of the behavior of the second animal, the partner, whose 
confining cage prevented escape from the grating, and limited its movement. 
Each protagonist was paired with six partners in successive observational 
sessions. The purpose of introducing the partner into the study was to ascertain 
whether the activity of the protagonist varied consistently with different 
partners. 

Each of the seven chimpanzee subjects was paired with every other 
subject in the protagonist-partner relationship, making 42 pairings in each 
series of observations. The record for each session was later scored to obtain 
а summary activity measure, which was in essence a weighted average of 
scores for the various behavioral acts of the protagonist during the session. 

Observations involving all 42 pairings were repeated several times 
over a period of five years, giving the five sets of observations which are 
analyzed in this paper. A set of sessions involving all 42 pairings and carried 
out at approximately the same time will be referred to as a replication of the 
experiment. The dates of the five replications were October 1943, November 
1943, October 1944, April 1948 and July 1948. 

The activity scores obtained in the first observational period, October 
1943, are shown in Table 1. In this table the chimpanzees acting as protag- 
onists are named in the column headings; their partners are named in the 
side headings of the rows. A score is the activity measure of the chimpanzee 
named in the heading of the column in which the score occurs when this 
chimpanzee is paired with the chimpanzee named in the corresponding row 
heading. For example, the score 252 in the column headed Falla is the activity 
measure of Falla in the presence of Banka. The activity measure of Banka 
in the presence of Falla is 189. The table immediately suggests analysis of 
variance as a natural method for analyzing the data. The complete analysis 
involves joint consideration of three factors and their interactions. 

Factor 1: Differences among protagonists. This analysis should answer 
the question whether the observed differences among chimpanzees as pro- 
tagonists are small enough to be attributed to chance or sufficiently great 
to be regarded as essential differences among the animals. 

Factor 2: Differences among partners. This analysis is made to determine 
whether the observed differences in activity stimulated by partners are of 
such magnitude that they may be attributed to chance. 

Factor 3: Differences among replications. Differences in activity among 
replications may be attributable to age of animals, to the time between the 
replications, or to their order. 

The interactions among these factors are also of interest in the considera- 
tion of the data at hand. 

Error. The chief error component used in the analysis of variance is the 


JOSEPH LEV AND ELAINE F. KINDER 3 


second-order interaction. This is always the error component for testing 
significance of the interactions. However, the significance of the factors, or 
main effects, may also be tested by some of the first-order interactions. 
The factors and their interactions will be shown in relation to the activity 
data of this study in several tables. A complete display of all scores would 
require five tables like Table 1, each showing scores for 42 pairings for a 


TABLE 1 
Activity Scores of Seven Chimpanzees Each 
Paired with Every Other Chimpanzee 


(October 1943) 


Protagonist 

Partner Banka Falla Fanny Flora Jed Jent Karla Total 
Banka 252 199 168 
гаша 189 252 21 
Fanny 188 279 205 
Ure 160 259 206 
et 119 305 219 252 
Karla 178 266 255 211 

170 217 218 189 
Total 1064 1578 1409 1236 


given replication, i.e., four tables in addition to Table 1. These four tables 
will be omitted. The entries in Tables 2, 3, and 4 are totals of entries based on 
the five basic tables. 

In Table 2 each entry is the sum of the scores for five pairings of à 
partieular protagonist with a partieular partner, each of the five scores 
being a measure of activity at a given session. In Table 3 each entry is the 
sum of the activity scores of six protagonists paired with à given partner at 
à given replication. In Table 4 each entry is the sum of the activity scores 
of a protagonist: paired with each of his six partners. 

А In order to visualize the relationship between the basie activity scores 
like those shown in Table 1 and the sums shown in Tables 2, 3 and 4, the 


TABLE 2 


Sums of Activity Scores for Five Pairings 


of Protagonist and Partner ^ 


Protagonist 
Partner Banka Falla Fanny Flora 
Banka 1021 996 151 
Falla 709 1123 825 
Fanny 799 1070 813 
Flora 714 1121 996 
Jed 800 1161 929 866 
Jent 808 1016 1118 848 
Karla 735 1123 1212 882 
Total 4565 6512 6374 4991 


4 PSYCHOMETRIKA 


TABLE 3 
Sums of Activity Scores of Six 


Protagonists Paired with a Given Partner 


Replication 


Oct. July 

1943 1943 1944 1948 1948 Total 

Banka 1129 950 1167 980 1009 5235 
Falla 1286 911 993 977 871 5038 
Fanny 1149 969 1126 826 892 4962 
Flora 1119 1131 1081 934 899 5164 
Jed 1343 1021 1065 1089 939 5457 
Jent 1263 992 1232 995 919 5401 
Karla 1224 1150 1173 1011 938 5496 
"Total 8513 7124 7837 6812 6467 36753 

ТАВІЕ 4 
Sums of Activity Scores for Each 
Protagonist Paired with Six Partners 
j Protagonist j š ü 

Replication Banka Falla Fanny . Flora -— Jed Jent Karla Total 
t. 1943 1064 1578 1409 1236 1010 1159 1057 | 8513 
ae 1943 918 1143 1274 927 873 1043 946 | 7124 
Oct. 1944 | 1060 1592 1143 1091 967 750 1234 7837 
Арг. 1948 814 1053 1358 862 718 1018 989 6812 
July 1948 | 709 1146 1190 875 156 689 1102 6467 
Total | 4565 6512 6374 4991 4324 4659 5328 pe 36753 


reader will note that the totals of the rows in Table 1 are recorde 
column of Table 3, and the totals of the columns in Table 1 are 
the first row of Table 4. The remaining entries in T. 
similarly. 

Formulas and computations for the analysis of vari 
presented, The mathematical basis for the formulas w 
in the mathematical appendix. The following symboli 
the formulas: 


d in the first 
the entries in 
ables 2, 3, 4 are obtained 


variance will now be 
ill be sketched briefly 
sm will be needed for 


X; = the basic activity score resulting from a 
in which the ¿th partner is paired with 
kth replication. 

For the study at hand ¿ranges from 1 through 7, 
j ranges from 1 through yd 
k ranges from 1 through 5, 
z = j for the same observation. 


single Observation Session 
the ¿th protagonist in the 


РЧ 
M 


< 
1 


p 
T 


M H M 


In 


JOSEPH LEV AND ELAINE F. KINDER 5 


the sum of all observations for the ¿th partner and the jth protag- 
onist, the summation being made over all replications. This is an 
entry in Table 2. 


= the sum of all activity scores for the ith partner in the kth replication, 


the summation extending over all protagonists. This is an entry in 
Table 3. 


= the sum of all observations for the jth protagonist in the kth replica- 


tion. This is an entry in Table 4. f 
the sum of all observations on the ith partner for all protagonists 
and all replications. It is an entry in the column headed "Total" in 
Table 2. 


= the sum of all observations on the jth protagonist, the summation 


extending over all partners and all replications. It is an entry in the 
row headed “Total” in Table 2 (also Table 4). А 
the sum of all observations in the kth replication. It is an entry In 
the row headed “Total” in Table 3. 

the total of all observations in all replications. 

the number of animals in the study. 

the number of replications. 


the following formulas SS is an abbreviation for sum of squares, 


i 10 of 
MS for mean square, df for degrees of freedom, and F is the usual ratio 
mean squares. For variation among partners: 


4) 


- ρ-1 T 2 > Χι Χα. 
πα es de کے‎ — 
` = mp — 2) ΣΧ... + rp(p — 2) 


„ә 


> a. = Ben $ 
+ р 00-2) r- Np — 2 


df = р — 1. 


For variation among protagonists: 


(2) 


Lu 72 2 > Χι.Χ.ι. 
πο 9) 22 X. + mo 2) 
vot 0l M 
+ р D@ — 2 = 00-2 
а = р – 1. 


Гог variation among replieations: 


(3) 


2 


wa ἀπ. 
55 = xp — D rpp — Ὁ 
df = r — 1. 


6 PSYCHOMETRIKA 


For partner X protagonist interaction: 
Xi. ET 
ОЗ ο 0X8. p 
(4) 


Olin b DS 


2 20 Xy Xu 4 Ax 1 
rp(p — 2) r(p = DG — 2) 


r rp(p — 2) 


df = р? — 3p + 1. 


For partner X replication interaction: 


ED 1 z D Σε 
В = р 9) Σ Ух. ts 


(p — 1)(p — 2) 
τον Xu "pelos. xt 
(5) pp — 2) rp(p — 2) тр(р — 1)(р — 2) 
D ΧΕ. 2.5 ХХ... ian 
Do  rp-2 Frm Do D 
df = (p — Qr — 1). 


For protagonist X replication interaction: 


DEE 
=p рро) 
4:22 Ухх. (p-05 Xu. δορά. 
(6) p(p — 2) rp(p — 2) rp(p — 1)(p — 2) 
DUM, DDE. e xe 


"(p-1)0-23 тр rp DD 
df = (p — 1) — 1). 


For error: 


gs = а Lee uL D y? 
ΣΣ ΣΧ. = dum 2 XE 


== EDI MU yx, 22 УХ, 


p=1 
үз рр) πρ 3) 22 X. 
(7) D ЖШ 2 DOE. 
ur DR @- Do-3 
Ἢ 213. XU XE 


mp- р DoD 
= (r = I’ — 3p + 1). 


JOSEPH LEV AND ELAINE F. KINDER uf 


The formulas will be applied to computations relating to the activity 
data of the study. The sum of squares > У) 2, Xš; is not available from the 
tables because the original scores are not shown. Additional sums of squares 
will be shown to help the reader, together with their source. 


$5 > > Xi = 6,830,715, from original measures, 

>> > Xi. = 33,060,491, from entries of Table 2, 

ΣΟ Xia — 39,164,035, from entries of Table 3, 

ΣΣ X! = 40,314,363, from entries of Table 4, 
ΣΣ X. X... = 38,728,822, from entries of Tables 3 and 4, 
i.. = 193,230,675, from row totals of Table 2, 
197,574,167, from column totals of Table 2, 
272,866,547, from row totals of Table 4, 
= 192,148,558, from totals of Table 2, 
1,350,783,009. 


The analysis of variance is shown in Table 5. 

The analysis of variance in Table 5 is based on a model in which the 
components of a measure are constants, except for the error component. 
Consequently the conclusions are valid only for the seven animals in the 
study. 

It is more interesting to study a second model in which the seven animals 
are regarded as a random sample from a population of similar animals. This 
is a mixed model described in the mathematical appendix. All components 
involving animals are then treated as random variables. Only the general 
mean and the component due to the main effect of replications are constants 
in this model. Using this model the interactions are still tested as in Table 5, 
but the main effects require more complex tests. 

Tests for partner, or protagonist, main effects are made directly by the 
ratio of mean square of the main effect to the partner X protagonist inter- 
action. Thus the test for significance of differences among protagonists is 


F = 24,794.0/769.3 = 32.2, 


which is still significant at the 0.5 per cent level. 
The test for significance of variation among replications 
Here 


is more complex. 


(p = Ὁ i replication interaction). 
± (mean square of protagonist X replic 


V7 рф 2) 
3 H H 5 H 
(8) y: = ο, — D (mean square of partner X replication interaction). 
TU — 


2 
Ya = Am wi 2 (mean square of error). 


8 PSYCHOMETRIKA 


mean square of replications 
Vi + Y2 — Ys 


This ratio has an F distribution (approximately) with r — 1 degrees of 
freedom in the numerator and 


(9) g= 


αὶ = : @ + ys — ys)" - 
Yı ma Y2 + Ys 
F— Dp —D ° (—20-1) (= DY’ – 34+) 
degrees of freedom in the denominator. 

For the data in Table 5 the test for differences among replications 
indicates that F = 5.25, πι = 4, and n, = 26. This value of F also shows 
significance at the 0.5 per cent level. 

The treatment of the replication component as a constant in the second 
model calls for justification. This treatment was adopted because the replica- 
tions were spread over a long period, with the consequent changes which 
may be attributable to time. In fact the data show a fairly consistent decline 
in average activity over the replications. It would seem illogical, therefore, 
to treat the replications as a random sample from a population of replications. 
Even if the replications had been concentrated within a shorter period, one 
would hesitate to regard them as a random sample because of possible effects 
of order among the replications. 

The second model leads to a measure of the reliability of the mean score 
of a protagonist over all his partners at a single replication, in our symbolism 
Х.,„/(р — 1). The formula is developed in the appendix but will be de- 
scribed here. 

Sample estimates of the variance components which arise from the 
second model are computed as follows. In the symbolism used, the sub- 
scripts a, b, c refer to partners, protagonists, and replieations, respectively. 
A single subscript indicates a main effect or error. A pair of subscripts indicate 
an interaction. Using the mean squares derived from formulas (1) through (7), 


(11) st = MS error, 


(12) g 25 (r E 1p es 1) ( 


т — 3j MS partner X replication —s?), 


к= 1 = d ñ 
3) s = ере (MS protagonist. X replication —s2), 


1 1 š 
(14) $, = з (MS partner X protagonist— 52), 


Б IEEE νο l. Д Я А Я ON 
(15) αν = mp = 8 (MS protagonist — MS partner X protagonist), 


JOSEPH LEV AND ELAINE F. KINDER 9 


(16) se dE. (MS partner — MS partner X protagonist). 
rp(p — 2) 
The reliability of the mean of a protagonist over all partners at a single 
replication is then 
и ا‎ — Ds + 55 + 5 
1 ation reliability = 2-8 
(17) Replication reliability G Тай τὰ T sh + (р — Dsi, + g +a: 


which is the correlation between mean seores for the same protagonists and 
partners at separate replications. From Table 5, 


TABLE 5 
Analysis of Variance of Activity Data 


with Second-Order Interaction as Error 


Source of Variation > Sum of Squares Degrees of Freedom ee = 
Partners : Š | 794.0 zs 
Protagonists 148/164 Š 16.1305 de 
nena hee 4 16, 130.5 28.34* 
artner x Protagonist 
p IMeraction * ази 29 q69.3 1.35 
artner x Replication 
 IMeraction 19,349 24 806.2 1.42 
rotagonist x Replicati 
ragonist x Replication рм Ж 2,162.0 4.85" 
Error 66,012 116 ΕΒΕ 
*significant at 0.5% level 
s = 569, 
š 24 
8, = = (806 — 569) = 32, 
“5 175 ( 
24 5 
5, = = (2,762 — 569) = 301, 
175 
2 1 5 = 40 
si, = = (769 — 569) = 40, 
9 
st = -Ê (24,794 — 769) = 824, 
175 
g= πος (596 — 569) = 1, 
175 
6(824) + 1 + 40 = .674. 


Replication reliability — 6(824) + 1 + 40 + 6(301) + 32 + 569 


In addition to replication reliability, there is also ui Me PA 
This measure of reliability is based on the mean Score i: Ὃ μεις 
with one partner over all replications, X,,./r. The resulting р 


10 PSYCHOMETRIKA 


bility is then у 
TS, 1 

TS) + rsa + тз + s; ° 
which is the correlation between the mean scores of protagonists computed 
over all replications but with differing partners. For the data in Table 5 
this reliability is .842. 

Where both replications and partners differ, but protagonists remain 
the same, the reliability is 


(18) Partner reliability — 


(19) Partner-replication reliability — ^ E Eur tee Xu 
For the data in Table 5 this reliability is .486. 


Mathematical Appendix 
It is the purpose of this appendix to state the mathematical basis for 
the formulas used in the paper. 
The sums of squares were derived by use of the following model: 
Xu, = m + a; + b; + c, + (ab) ;; + (ac), + (bc); + ёк, 
Xin = 0 
sx = a normally distributed variable with mean 0 and variance o? for 
all combinations of т, j, k. The remaining values m, a; , b; , etc. are 


constants satisfying the conditions stated below. 
m = the mean of all observations, 


a; = the contribution of the ith partner, 


1 = 1, 


b, = the contribution of the jth protagonist, 

e, = the contribution of the kth replication, 
(ab);; = the interaction of the ith partner with the ;th protagonist, 
(ас) = the interaction of the ith partner with the kth replication, 
(bc); = 


the interaction of the jth protagonist with the kth replication. 
The constants satisfy the following conditions: 


| 
o 


> αι = 0, > b; = 0, Σα =0, (ab), = 0, Σ (ab), = 
EF Tm zi i=l 
VR Ῥ 

У (ab); = 0, 2 ( 


1-1 


Ш 


ас) = 0, Ὁ (ac); = 0, > (bc); = 0, Σ (бе); 0 


| The determination of the sums of squares is carried out by least squares 
using methods described in several well-known references 


[1], [2] and [5]. A full derivation of fo: 
will not be carried out here. However 
sented. 

The likelihood ratio method describ 
using particularly Theorem 4, will be a 


; see for example 
rmulas (1) through (7) is lengthy and 
', a sketch of the method will be pre- 


ed in sections 8.3 and 84 of [5], 
dopted. Το apply this theorem set 


JOSEPH LEV AND ELAINE F. KINDER 11 


up the sum of squares 


[= 25 25 25 [Xu = he ата (ab); — (ac) — (be) aT. 
Then find values of the constants which minimize U, subject to the conditions 
on the constants. If the constants found in this way by least squares are 
distinguished by cireumflexes as m, a; , ete., then 

а= ARA ES A~ ا‎ M 
nog = 2 DE 3 [Xia — m — a; — b; = οι — (at); — (ac) — (bo) 
is the error variance. 
To test a hypothesis that certain of the constants are zero, similar sums 
of squares are formed with these constants omitted. For example, to test the 
hypothesis that the a; are all zero, form the sum of squares 


Uic E p» » [Xue m= δι Ck = (ab); — (а): — (be) 4T. 
In terms of the least squares solutions, using the attendant conditions, 
теё = E Σ E Kin — m — ὃν — ὃν — (b) — Wa = Onl. 
Finally the sum of squares for partners is given by 
noè — пой. 


Р mM, E 2 
The least squares estimates of the constants for substitution in σα, 


are 

sse 

POE l): 

d. = (p > DX... + X. — Χο 

h rp(p — 2) i 

ees (p— 1X; +X. = 208 

à rp(p — 2) j 

2 = A UR t D 

* pp — 1) rp( = 1)’ 2 
ТЕУ НЕЕ ee) a ш ша MEE 
(ab);; vr тр(р [T 2) , 
(ϑ.. = 
Фо), Е 


Р i pen can 5) etw] 
[у Ha гае COLLEGE | 


Dated 


Aces No 
| у. 


س 


12 PSYCHOMETRIKA 


The derivation of nog is greatly simplified if one notes that it is equal to 
пой = 27 Ὁ > ХХ = m E a, X b; -ᾱ = (ай), ex (ао). = (bo) а]. 


The derivation of nc? , when the a; are s 


et equal to zero, leads to the 
following estimates of the constants: 


m=m Όσοι, (а. = @),,, 


= 


(ac); E GD: ; (be) ;, = (be) se , 
but 


ἕω в, Ж. 
° PD mo-i) 
In deriving these estimates, the Lagrange multipliers, by which the 
conditions on the constants are introduced, play a very important part, 
and the mathematics becomes somewhat tricky. For example, obtaining the 


estimate (ab);; involves the expression 
» 25 Σ [Xia — m — b; — c, — (ab);; = (бо) == (be) ;,]° 
E Ум. > (4; 2 5X, Y (а). 


N@ag that m + b, = X.;./r(p — 1), and usin 


NÉ 5 the conditions on the 
remaining constants, then the least Squares equati 


(ab); are ons for evaluating the 
r(ab); + Ἀι.. ἠ- Αν, = Xi. -- Xx 
pi 


The Lagrange multipliers are not Zero here as th i 
MS ° <> ley are in many of the 
other derivations, but the can be elimina собе 
„а , y eliminated by considering the related 
а) +. А. =. -. Жз. 
nm = | 
ultipliers and evalu 
g relations: 


Ум. + O= DX. = 0, 


Elimination of the Lagrange m 


. ti NS 
accomplished by use of the followin ation of the (ab),; is 


(6-- Όλι. - XA, = $—DE Ер Д 
їжі p η 
» λ,.. T (p x Ελ. = 0, 
isi 
(p — DX; + X; — 


— DA... М. = аы Оез ү, 


JOSEPH LEV AND ELAINE F. KINDER 13 


and substitution of these relations, and the preceding ones, in the equations: 


Р P 


SAR УА рэ Fin T PA Toe = RN 


i=l j=1 


and in the parallel equation: 
Do Ἀν. А E A, PFA = 0. 
πὶ ixi 
Considerations similar to the above lead to the remaining sums of squares' 

The reader should ñote that the sums of squares in formulas (1) through 
(7) do not total to the sum of squares of deviations from the grand mean of 
all observations. This is а characteristic of the nonorthogonal case in analysis 
of variance, of which the present analysis is an example. 

In order that the seven animals may be regarded as a random sample 
from a population of similar chimpanzees, the model is extended so that all 
components involving the animals will be variables which are normally and 
independently distributed with mean zero and variances to be defined. 
Using the notation V(X) for variance of X, we shall write 


Via) = ei, V(b) = о, V[(ab);;] = σον 
Vada] = σε, Иа] σε, Ven) = σε; 


πι and c, are constants as before. 
The following conditions are imposed on the components: 


Σα-ο, У (aa = 0 for each û, 
k 


k 


У) (boa = 0 for each j. 
k 


The mean squares using this model are derived from formulas (1) through 
(7). The expected values of the mean squares are 


2 
E(MS partners) = z, + Toa, + me =a ) σὰ 
(ο -- 2) 
E(MS protagonists) = o + Toss + = EA σὺ 
Fon ol) va з 2; 0 
E(MS replications) = о + "b г ect ы) E 


E(MS partner X protagonist) = o + roa 
j 1 P0 
E(MS partner X replication) = oe + G — Dp = 1 


2 
Cac 


2 
Foe 


CES 2 rp(p — 2) 
E(MS protagonist X replieation) — σε + P= De =o 
E(MS error) = ei. 


14 PSYCHOMETRIKA 


The formulas for testing main effects against interactions, (8) through 
(10), are derived from these expectations by methods described in [1, chap. 
28]. From these expectations the reliability formulas shown above can also 
be derived. The values of s? , Sze , Sie , etc., are obtained by solving the expres- 
sions for the expectations for the variances e? , оз. , ore , etc., successively, 
and then substituting sample values for the population values. 

To derive the formula for replication reliability, from the second model 


> αι Σ (ab);; 25 (ас) Уе 
SZ TET = ea po i or 1" 


Consider now the mean for a protagonist at a parallel replication k' 
so that 


i ὦ У; (ас) PEE 
Pose] σι thet 


: ха) d Xu 
εσας τι) 5 m+, в) = mM + o. 


Using these expectations and noting that X ;,/(p — 1) and X ;,/(p — 1) 
differ only in the terms containing k and Ι΄ as subscripts Pr 
, 


By the model the expectation of a square of a variable is its variance 
whereas the expectation of a product of two different, variables is zero Also 


Xi Xe σὲ о? 
( a) ( ik ) σὲ a ab 2 а? oi 
V "EU V b + + + à ac [ANE 
p p 1 p 1 p 1 a p DE 1 


The covariance and the variances jointly lead {ο the replication reliability 


(17). 


To obtain the formula for partner reliability, by the second model 


ы Σου 
E = m + a + b, + (a, H E 
a 


since in summing over all replications the terms containing k vanish 


JOSEPH LEV AND ELAINE F. KINDER 15 


Similarly, for a different partner 


Xii. 
T 


= m + αν + b; + (abe; + Len, 


Formula (18) follows readily. 
To obtain formula (19) correlate 
Xin = oma; b; +a + (ab);; + (ас); + (be) sx + бик 


with the corresponding expression for Ху, , where ? and k differ from £ 
and л”. 


REFERENCES 


[1] Kempthorne, O. The design and analysis of experiments. New York: Wiley, 1952. 2 

[2] Kolodzieczyk, S. On an important class of statistical hypotheses. Biometrika, 1935, 
27, 161-190. 

[3] Quenouille, H. M. Design and analysis of experiments. New York: Hafner, 1952. 

[4] Walker, H. M. and Lev, J. Statistical inference. New York: Holt, 1953. 

[5] Wilks, S. S. Mathematical statistics. Princeton: Princeton Univ. Press, 1944. 


Manuscript received 10/11/55. 
Revised manuscript received 2/10/56. 


PSYCHOMETRIKA—VOL. 22, NO. 1 
MARCH, 1957 


ON THE RANKING PROBLEM 


ISADORE BLUMEN 


CORNELL UNIVERSITY 


Observed rankings of objects can be treated as arising from a time 
dependent probability process. Under such circumstances, associations 
observed are an indication of the character of this underlying process. In 
the partieular example treated in some detail here, a quantity related to 
Kendall’s tau is found to have an important role and its properties are 


examined. 


1. Introduction 


One of the frequent problems which research workers in the social 


sciences raise with statisticians involves relations between variates when 
the observations consist of rankings of the units which are observed. There 
are many procedures for testing the hypothesis that the rankings are inde- 
pendent in a probability sense. But the research worker has usually antici- 
pated this possibility—he is really interested in measuring the concordance 
of these rankings. On this point the measures hitherto proposed have nof 
proved completely satisfactory. Speaking generally, they suffer in varying 
degrees from difficulties of interpretation, from stringent assumptions about 
underlying measurements, or from the lack of an adequate sampling theory. 

It is the purpose of this paper to suggest the usefulness of probability 
processes as the point of departure for some problems of this sort. Instead 
of viewing a particular ranking as one out of a hypothetically infinite set of 
random drawings of objects from some population, suppose that there is 
some process producing the possible rankings, and that each ranking has its 
probability determined by the character of the process. The observed rankings 
of the n objects then give some insight into the process which is producing 
the concordance. 

A particularly simple model will b 
and unreasonable as an approximatior 
complex ones. They are warned, however, 
can easily become overwhelming with no cor 


e used. Those who find it unrealistie 
1 to their problems may try more 
that computational difficulties 
responding gain in usefulness. 


2. Single Judge and a Standard Ranking: The Model 


ays to formulate the problem. For convenience, 
asked to rank a collection of n objects 
students who are 


There are a number of w i 
it will be assumed that a judge is ν Mos 
which have some natural order. These objects could be 

17 


18 PSYCHOMETRIKA 


being ranked on ability, houses which are to be ranked on liveability, auto- 
mobiles on beauty, and so on. The judge will generally begin with a tentative 
ranking and then compare objects whose rankings are close. He will adjust 
his rankings by comparison of close objects until satisfied or until required 
to report his ranking. 

In this model, it is assumed that at each comparison the judge takes 
only those (n — 1) pairs which differ by one rank and that he chooses one 
among these pairs at random, i.e., with probability (n — 1)'' for each pair. 
It is assumed that there is a preferred or standard order for the pair, and that 
the pair is put into this order with probability p, 0 € p < 1, and in the reverse 
order with probability ç = 1 — р. This process is repeated over and over— 
selecting one among the (n — 1) adjacent pairs at random and assigning 
them the standard order with probability p without regard to the previous 
ordering. 

This is a Markov process. The essential probability characteristics which 
determine the shift from one ranking to another can be given by a transition 
matrix, each element in the transition matrix being the probability that à 
ranking in one order will be changed to another order at cach stage of the 
process. 

Before writing this transition matrix, it will be convenient to group the 
n! orders into classes. The first class, S, , will have one ranking, the objects 
in a standard order. The second class, Š, , will consist of the n r 1 possible 
rankings obtained by permuting adjacent objects in the ranking So . If 
adjacent objects are then permuted in rankings in the class S, era dus 
member of S, or a new ordering is obtained. All such new orderings form the 
class S; . Similarly S, is formed by permuting adjacent objects fü S, and 
taking those rankings not in S, ; S, is formed from S, , and ^" on T m 
be shown that the number of such classes is [n(n — 1)/2] ae i Сй 

The transition matrix will be an n! by n! matrix in which the element 
in the ¿th row and jth column will be the probability that the ith of the n! 
possible rankings of n objects will be changed by the judge into the jth of 
the possible rankings. The matrix will be written so that the first m will 
give the probability that a judge, given the objects in the standard order 
(in class So), will change to any other order or continue to use the ranking 
The first column, on the other hand, will give the probability that He vill 
move from any ranking into S, . 

The second through the nth rows will give the probability of moving 
to any ranking from any one of the rankings in S, . The subseripts -9.....ηι 
may be assigned in any arbitrary way to identify the rankings in S d ОГ, 
responding subseripts are used for columns 2 through n in order to dive the 
probabilities of moving into the corresponding rankings in S, . The next 
rows belong to rankings in the class 5, , and so on to the one inverse ranking 
in Syen-1)/2 - 


ISADORE BLUMEN 19 


The probability of moving from S, into any particular ranking in Š, 
is q(n — 1)7'. There is no possibility of moving into any other ranking in a 
single comparison, and all other transition probabilities in the first row, 
except the one in the first column, designated m,, , are zero. Since the sum 
of the elements in any row must be unity, mi; = p. 

In comparing pairs when the ranking is in 5, , there are three possibilities: 
(1) If the one pair which is not in the natural order is reversed, then the new 
ranking will be in So . The probability of this is p(n — 1)7. (ii) If any other 
pair is reversed then the new ranking will be in S; . The probability of moving 
from one of the rankings in S, to any particular ranking in S; which ean be 
reached from it by one permutation is g(n — 1)7. (iii) There will be no 
change with probability [q + (n — 2)p] (n — 1)", since the row sum must 
total to unity and all other probabilities in the second through the nth rows 
must be zero. 

In general, the n! by n! probability matrix is partitioned into 
{{n(m — 1)/2] + 1]? sub-matrices corresponding to the classes described 
above. All sub-matrices off the principal diagonal of sub-matrices by more 
than one class have zero elements since one permutation moves a ranking no 
more than one class at a time. 

From any ranking, call it the ith, there are (n — 1) possible other rank- 
ings which can be reached by one permutation. There will be some number 
c; (0-X c; < n — 1) of these which lead to the class with a subseript which 
is one lower than that of the zth ranking. The probability of arriving at any 
particular one of these c; rankings is p(n — 1) mes 

There will be n — c, — 1 possible rankings in the class with the next 
higher subscript than the ith ranking. The probability of any particular 
one of these is g(n — 1)7. Finally, the probability that there will be no 


change in a single comparison is 


(2.1) ти = [iq + (n = с; — 1)р]@ — 1)”. 


All other elements їп the row corresponding to this ranking are zero. 


For example, for n = 3, 


μα τ᾽. "πι να ον Oy MO 

p/2 1 (q + 20/2 о oy NUR 

i )/2 i 0 q/2 10 
ΠΣ O DER ἘΕΊΡΉΕΙ (CODE е. РЕА ROS 
ΤΝ o i@+tp/2 0 192 
μιν, ο ο ED eto 

μυ DUC jS 0 e p/2 1 q 


where M (p) is the transition matrix whose probabilities are determined by p. 


20 PSYCHOMETRIKA 


For p = 1/2 and all n, M(p) is symmetric. Furthermore, it is always 
` possible after finitely many stages of the process—i.e., after finitely many 
comparisons—for any of the possible rankings to arise. Hence, from the 
theory of Markov chains [1] if a sufficiently large number of comparisons 
is to be made the probability of having any ranking will be the same as 
that for having any other ranking, i.e., (n!) '. More precisely, when p = 1/2 
the limit of the probability that a judge will have the objects ranked in a 
partieular way is (n!)' no matter what the initial tentative ranking. 
When р ”έ 1/2, the limit probabilities can be obtained from the theory 
of Markov chains by solving for /; in 


nl 

(2.3) 3 т = t; , (j ο) 
i=l 

and 


n! 


>t = 1, 
i=l 
where m,; is the element in the ¿th row and jth column of M (p). (When р 
is zero or one there is a trivial special case which is excluded hereafter.) 
In particular, for j = 1, 


(2.4) pi + tpn — D> + +++ pn =) =. 

Rewriting, 

(2.5) аһ = (n — 1)7'р(Ь+ b + +4), 

for which the relationship £,, = qp t, is a solution when yi 0, 8. macs ¢ Me 


For j = 2, 3, +++ , (n! — 1), the equations (2.3) can be typified by the 
one for j = 2. This equation is 


tgn — 0 + tlg + (n — р] — 0^ 
+ hapa — 1) + рв — 1) = b. 


(2.6) 


By straightforward substitution in (2.6), it is seen that 4, = рд, and 
ἐμ. = qp tz are solutions, where k, = n + 1,n + 2, ---,2n — 2. 

There are, in general, more than (n — 2) rankings in S, . It is clear, 
however, that solutions of (2.3) for any ranking in S, аге ἐς, = gp ts» 
where k, ranges over the rankings in S, . Similarly, for any ranking in S; 
ἐμ = q'p ^t, are solutions to (2.3) and generally, t, = {γι , where k, 
ranges over the states in the class with subscript s. If the condition that 

th би = 1.18 added, the above solutions are unique. š 
The limit probabilities which are the solution to (2.3) determine the 
behavior of the observed rankings. It is this probability distribution (see 
Tables 1 and 2 below for n — 3 and 4) which is required for the empirical 


ISADORE BLUMEN 21 


study of the behavior of the process deseribed previously. 'Things are simplified 
further if it is noted that, for fixed s and p, the limit probabilities of all mem-. 
bers in the class S, , s = 0,1, 2, --- , n(n — 1)/2, are the same. That is, s isa 
sufficient statistie for p, and we can restrict ourselves to the distribution of s. 

When p — 1/2, this distribution is basically that of Kendall's tau, since 
+ = 1 — 4sn (n — 1)^*. The probability generating function for this case 
is essentially given by Kendall [2] and is 


an EG) = AA UHA 1-α-α) «(1 Fa} ° F α 
i7 


(ως — avr Ща = 2121 < 1. 


Hence, from (2.7) and the solutions to the equations like (2.6), the moment 
generating function of s is, for 0 < p < 1, 


(0) = Ee") 
B 271 1 pre + r + ed rie 
(2.8) = τοτε ή 
" үсе & 1 =. (re): 
- (2-5 I | EE 


where | re* | < 1, andr = qp 


Before proceeding further, it should be noted that the process discussed 


here is not the only process which gives limit distributions whose generating 
functions are (2.7) and (2.8). For example, any Markov process whose 
transition matrix had column totals equal to unity would give (2.7). Similarly, 
other simple processes lead to (2.8). (See for example [3].) Such processes 
have not been explored here because none seemed more reasonable simple 
approximations to useful situations than the one treated. And finally, as 
pointed out earlier, more complex distributions can also follow from other 
assumptions, but the exploration of these was beyond the purposes of this 


paper. 


3. Single Judge and Standard Ranking: Inference Problems 


Tests which involve + in the usual way have optimum properties. Thus, 
to test the null hypothesis that an observed ranking arose from a process in 
which there is no tendency toward conformity with a natural eee 
p = 1/2, against alternatives that there is a tendency toward conformity, 


p > 1/2, a uniformly most powerful test is to reject the pos id 
whenever 7 is small. Similar remarks apply to invariant tests oi p 
against p = 1/2. . 

The more interesting problems involve th: 


g-as 
GrP 


e estimation of the error, 1.e., 


22 PSYCHOMETRIKA 


the estimation of p. It will first be shown that 7 is not a satisfactory estimator 
of error when the process above is relevant. 
Using the moment generating function of s (2.8), 


д log ¢ 
90 leno ] 


SE i 


Е ма саулы 
_l-r т 


Ф0(0) = Ф 
(3.1) 


(We restrict ourselves to r < 1. The case r > 1 goes through in a parallel 
way, and the case r — 1 is treated by taking advantage of the continuity of 
the moment generating function.) Now, 


4 
(3.2) E()-1-— nn — DES. 
(3.2) can be bounded by 
= Ar 
3.3 E(v).—1« - 
G2) (9 (n — 1)(1.— т) à nn — 1)(1 — ry 
and 
(3.4) RE ce, ' 
(n — 1) — 7) i 

Hence, lim,.. I(r) = 1 and, since т must be less than unity, plim,..7 = 1 

whenever r < 1. Similarly, it can be shown that when r > 1, НЕ 


When r = 1, we can obtain (т) = 0 and compute the variance by using 
(2.8) and L'Hospital's rule. It follows that plim,..7 = 0 Whenever т = 1. 
Why т has the peculiar property of converging to only these three 
values is seen from (2.7) and (2.8). For r « 1, the distribution of s is that of 
a convolution of n independent truncated geometric distributions each 
with parameter r. Hence, s/n will have the ordinary sort of behavior and in 
particular, will have as its limit the normal distribution Whose mean is | 


(9.5) E(s/n) = (1 — р)(2р — 7, 
and whose variance is 
(3.6) var (s/n) = р(1 — p)(2p — 1)n^. 


When r — 1, the distribution of s is that of the convolution of » inde- 
pendent chance variables, each having a point rectangular distribution 
i.e., each chance variable z, has the distribution Ῥ[αι = j) = 1 -- iy? for 
Jis 0, 1, 2, +++, i. Hence, 27,2, = s must be divided by a term of order | 
n”, as is done in computing т, if the mean of the limiting distribution is to 
be a bounded positive quantity. 

Since 7 is not a satifactory measure, attention is turned to estimation 
procedures for p and functions of p. The problem will be complicated by 


ISADORE BLUMEN 23 


the difficulty in writing the probability distribution for s explicitly. For 
small n, however, Table 16.3 in Kendall [1] can be used to write the exact 
probabilities, comparing various estimation procedures in this way, while 
for large n the asymptotic normality of s can be used. 

A reasonable requirement for an estimate of p when n is small might 
well be unbiasedness. The method of obtaining unbiased estimates can be 
illustrated for n = 3. The probability that s = 0, when r < 1, is 


=b, 
P {s 1) = 27, 
P [s = 2} = 2°, 
P [s= 3) = т". 


Hence, if p% is the estimator used when s = 0, рї when s = 1, and so on, 
then from the definition of unbiasedness [ 
pile + 2ptrl + 2pt + pt = p = (07. 
Equating coefficients of the like powers of r, 
pe=1, pt=1/2, т = 1/2, ps = 0. 
It is not difficult to show that these estimates are unbiased for all values of p. 


In general, let a, be the coefficient of α΄ in the polynomial (1 + (1 + 


a +a... (tate + x" *) and b, be the coefficient of z° in (1 + x + 2) 
Оа qu tog) a TEL GEG past 277"). Then the unbiased estimates are 


D b. 
(3.7) ү: 
b - = 
A 5 writing b, = 0. 
ЕЕЕ Р 


"This estimation procedure has the property of consistency but s n 
becomes large the difficulties in obtaining b, are the same аз To for pr 
A somewhat easier consistent estimate might, therefore, be used. One such 
estimate is 

s dn>2 Bay = ne cach, 
4 т Jm 3 
(a) when 6; = < 1/4 an » Pa = 4238-3 
А n-ds*—83, 
> 1/4 andn > 2, Bo = 1-7 σος 37 


(8.8) ^ (b) when gr D 


(9. παν ο p 1/2; 


24 PSYCHOMETRIKA 


where 
"- n(n—1) _ 


2 5. 


This estimation procedure is based on the fact that s has, approximately, 
a negative binomial distribution and (a) and (b) are essentially maximum 
likelihood estimates when r < 1 and r > 1, respectively, with a correction 
to reduce bias for small n. Since, when n is large, the conditions for use of 
(a) and (b) hold with probability arbitrarily close to unity the estimates are 
consistent for r = 1. 

For r — 1, (a) and (b) will be used with equal probability. If (a) is used, 
then the conditional probability 


(3.9) Pls - »1«« 


since 


8 1. 
n(n — 1) S 1 ad 
E()- n(n = 1) алй Tata = nn — n + 5), 


A similar relation holds for (b), and together with (c), proves consistency. 

The efficiency of these estimates is unknown. The probability distribu- 
tions of s for n = 3 and n = 4 are given below for three values of p together 
with the estimates p* and their variance. (Our estimate û coincides with p* 
for these values of n.) 


When p = 1/2 and n is large, the estimator is approxim 
distributed and it can be shown that 


(3.10) 


ately normally 
var (0) ~ р(1 — p)(2p — 1)°/n. 


TABLE 1 


Probability Distribution of 5 
and Estimate of p for n = 3 


brakes. 
pas = 
5 ρε sp p = 1⁄2 p = 2/3 P = 3⁄4 


i CUL LE = = ο کو ا‎ 
0 1.0 .167 .381 519 
1 0.5 «333 .381 .346 
2 0.5 333 .190 .115 
3 0.0 .167 “048 .019 
Var (p*) 084 075 .072 


ISADORE BLUMEN 25 


TABLE 2 


Probability Distribution of s 
and Estimate of p for n = 4 


P (s = 5] 
г ^ 
5 р* =P ΡΞ 1/2 p = 2/3 p = 3⁄4 
ا‎ e -- 

° 1.00 .042 .203 1350 
0.67 .125 .305 .350 
2 0.60 .208 .254 .195 
и 0.50 .250 152 .078 
: 0.40 .208 .064 .022 
΄ 0.33 125 .019 .004 
0.00 .042 .003 .000 
Var (р*) .032 .033 .038 


This approximation is useful only when 7 is at least moderately large. Bias 
— 6 and p — 2/8, for example, 


seems to be relatively unimportant. For n — 
there is an upward bias of .016. 

А When р = 1/2 еге is по bias. 
is large p is approximately 1/2 = 1/n. 


In addition, 16 ean be shown that when n 


4. Several Processes 


Now consider the case where there are several processes, each with the 
same standard ranking. In practice, these processes may arise from m judges 
attempting to rank n students on intelligence, n cities which have m different 
indices of their development, and so on. The standard ranking is unknown 


to the statistician. He is asked to estimate this ranking and to give some 


indication of the usefulness of the judges or indices for ranking purposes. 
ranking is given independently 


Each of the m estimates of the standard r enti} 
h gives the probability distri- 


and assumed to be derived by a process whic ыл 
bution of section 2. The probability of any particular set of m rankings 15, 


therefore, 

аа =) sq 
dD PIs Pls} Рів) = ΠΗ ση, («D 
where P{s;,,} is the probability that the ranking arising from. process 
i (i = 1, 2, ++, m), for which r; is the error parameter of the ith judge, is 
in the class which is Sis: permutations from the standard ranking. 


96 PSYCHOMETRIKA 


In particular, if z, = 7 = την = т, = T, 


- +=)", 
(4.2) П Ps.) = П aarp? 
where s = δι, + δει, + °°° + Smin With the restriction that < 1. 

A maximum likelihood estimate of the standard ranking is obtained if 
a ranking which minimizes the total number of permutations from the 
estimated standard order is chosen. This is very simple in practice, For 
each process, assign rank 1 to the “best” object, rank 2 to the next, and so 
on to rank n for the last. Obtain the total (or average) score for each object 
over all processes. À maximum likelihood estimate of the natural order 
is given by assigning Ist rank to the object with the lowest score, 2nd to 
the object with next lowest score, and so on to the nth rank for the object 
with the highest total score. In the event that there are ties, this application 
of the method of maximum likelihood does not give a unique answer, and a 
randomization procedure which gives equal probability among the possible 
rankings of tied objects seems reasonable. 

It is clear that, for fixed n, the standard ranking will tend to be chosen 
with probability one as m increases. For small m, however, the probability 
that a particular ranking will be chosen as the standard ranking does not 
seem to be easily obtainable in general. 

For example, for n — 3 and m — 2, the probability of a correct ranking is 
{1 + D + r + ση], the probability of either of the rankings one per 
mutation from the standard order is r(1 + 2r)[(1 + 7) + + + °)", the 
probability of either of the two rankings two permutations removed is 
r (r + 2) [(1 + 7) (1 + r + °T, while the inverse ranking has probabilit; 

3 233-1 : z 5 probability 
η" [1 + r)( + r + ντ] of being the estimate. There is little gain over 
using only one process to estimate the standard ranking. For then the prob- 
ability of correct and inverse ranking is the same as above for m = 2 while 
that of either ranking one permutation removed is rl -ἵ- (14-7 4. τὴ, 
This peculiarity seems to arise in large part, however 
number of ties possible in this special case. 

In the estimation of p, the methods of the previous Section are also 
relevant. That is, a consistent estimate of p as n becomes large is given by 


from the excessive 


sl mn +s — 3 ы 8 т 
(4.8) 03 επ 28074? when pad zs 


and an appropriate modification is made otherwise. When n is small, however 
it seems reasonable to consider only the unbiased estimates (3.7), (б СОЕ 
this estimate for each process, and then to average. Cl early Sue EE 
is consistent in m and unbiased. 

In the cases considered so far the association bétween the processes 
arose because there was a common standard ranking. The apparent differ- 


ISADORE BLUMEN 27 


ences in the observed rankings arose from fluctuations around this common 
standard. The number of permutations from the estimated standard ranking 
measured the tendeney for the observed rankings to agree. 

The comparison of several processes which do not have the same error 

` or processes which do not have the same standard ranking involve more 
diffieult problems which will not be treated in this paper. As rough practical 
guides, however, weighting the observed rankings in order to get better 
estimates of the standard order would in the first problem be an improvement 
over the unweighted scheme used earlier. Each rauking would lead to its 
own estimate of error. 

If the standard ranking is not the same for all m processes, this would 
lead to greater disagreement among the observed rankings than if there 
was a common standard. The pooled estimate of p would then imply greater 
variability among the separate values of s/n for each process than is the ease. 
One might, therefore, reject the hypothesis of common standard ranking 
when $77 (sj, — 8) /var(s;) is too small, using a chi square distribution : 
with m — 1 degrees of freedom as a first approximation. 


REFERENCES 


[1] Feller, W. An introduction to probability theory and its applications, Vol. I. New 


York: Wiley, 1950. ‚ Griff 
[2] Kendall, M. G. The advanced theory of statistics, Vol. I, Chap. 16. London: Griffin, 
1945, p 
[3] Blumen, L, Kogan, M., and McCarthy, P. J. The industrial mobility of labor as & 
probability process, Chap. 7. Ithaca, N. Y.: Cornell University, 1955. 


Manuscript received 4/24/56 


Revised manuscript received 8 /27 [56 


PSYCHOMETRIKA—VOL. 22, No. 1 
MARCH, 1957 


ESTIMATION OF ERROR VARIANCES OF TEST SCORES 


JOHN A. КЕАт5* 


EDUCATIONAL TESTING SERVICE 


i The representation of test scores as n-dimensional points leads directly 
um an үө of error variance at a particular score evel in the case of 
Í uve may ct νο are suggested for the case of non-equivalent 

ems. ximations а 3 i i i τ 
pirical data prepared by Dr. εναν weith REUS ша ыо 


General Representation of Test Scores 


: It is convenient for present purposes to consider the possible responses 
of a, person to a test of m items as points in a space of n dimensions, 


A = {а ааз *** αι), 
when a; = 0 or a; = 1. If B is another point, i.e., 

В = [b b b, «++ Dn}, 
then define distance from A to В, i.e., Рл, в aS 


Κα.) = [a = b) + (a. — ba)” e αν — δ) ] = Disa . 


In particular, the score S4 corresponding to A is given by Sa = Diao = 


Do, = the squares of the distance of A from the origin. 


It is also convenient to consider the arrangement in the n-dimensional 
space of points having the same score value. These arrangements will form 
) vertices and with centers at 


regular (n — 1)-dimensional figures with i" 


sss 8 
ὃν = S =>. 
nnn n 


Since all patterns with the same score are commonl 
and since C, is a type of average of such patterns, it is interesti 
that Русо = s/ Vn. This follows from the fact that Dic, = sum of 
squares of coordinates of C, = n(s°/n’) = s/n, i. e., the score values corre- 
sponding to the centers of these figures mark off equal distances in the pos- 


sible range 0 to n. 
'The next step is to consider the probability of occurrence of patterns. 


If all patterns are equally probable, then the distribution of scores will be 


and, Brisbane, Queensland, Australia. 


29 


the point 


y treated as equivalent, 
ng to note 


* Now at the University of Queensl 


90 "PSYCHOMETRIKA 


the binomial distribution P(s) — GE This case is trivial, as it corresponds 


to zero reliability. Of more importance is the case of equal probability of 
patterns for given score values, with no other restriction on the total prob- 
ability for a particular score. In this case all items are equivalent statistically. 
This case will be examined further with respect to a problem raised by 
Mollenkopf [4]. 


The Variation of the Standard Error of Measurement 


In his 1949 article Mollenkopf [4] studied the variation of the standard 
error of measurement with test score by considering the variance of the scores 
on a half test for persons who have the same score on the total test. Mollen- 
kopf's demonstration [4, p. 191] that this variance is in fact related to the 


the same raw score is inadequate, 
adequacy arises from the fact that the 


With raw score, what is true on the average 


` each individual raw score. 
Let s; be the score of individual j, and the scores on two parallel halves 


of the test be δη and sj, . Then 
8 εδ E 
= ln + en + tia + ej, y 
Where / stands for true score and e for error. 
If individual J is retested many times with parallel tests, | 
var(s) = var (za + ej), 
= constant. 
This is equal to error variance in the usu 


since ἐμ = £, 


al sense. But 


of error ariance, vhich is thus an estim: err varia: cein e usual s 
M Si ate of rror varian: e h 3 
š W t u ense. 
Consider now the cas | 


the point P = CIP SP Ῥ 

$ 2 n, Ser n wh re P, to 

Рз are all 1 (n is taken as even and Pia ιο P. rem all 0, ) А XN М /2; 
.e., 

then the frequency distribution of the e 


sJ Points in terms of the number (7) 


6 


s 


MEME ——— ^^ 


JOHN A. KEATS 31 


of unities in the first 1/2 coordinates will be 


(5) n/2 ) (3 n/2 ) 
UE N ACER) 
(3) ЖО 
8 n — 5 
Furthermore, this will be a conditional distribution in the bivariate distribu- 
tion obtained by plotting the score r on the first n/2 items against score 
s — ron the second n/2 items. The condition is that total score is kept 


constant. The proof of these statements is given below. 
Both of the distributions given have variances [1, p. 127] 


if s« n/2, and if s > 1/2: 


ls = s) 
() d n ТЕШ 


and, of course, this must equal 


» (sa — δια) 
(2) — 


4N, i 5 
where sj, + sj; = s, and N, is the frequency of score s, since Σὺ; (s — δε)" = 
У), 09: — sy = 4 >J; (r; — s/2) and since the items are equivalent 
7; = s/2. Hence (2) will equal (1) when N, is arbitrarily large. The relation 


which Mollenkopf wanted to obtain was 


> (s — Sia)” 


as a function of s = s, + δι. This is provided by the formula 


(4) _ error variance = 9 


for the case of equivalent items. 
Proof of Distribution 


Consider 


pL (p,21,P, Ра loan ο τν 


Patterns which have the same number (r) 
ill clearly have the same number (s — 7) 
d thus be the same distance from pi 


and patterns for score s < n/2. 
of ones in the first n/2 elements w 
of ones in the second n/2 elements an 


There will be 
(ή uR) 
r SE 


6 


89 PSYCHOMETRIKA 


such patterns and thus the probability of a pattern with score s falling into 


this rth category is 
(2X n/2 ) 
т /N-—r 


9 

n 
() 
since all such patterns are assumed equally probable. 

Clearly this rth category contains all patterns corresponding to a score 
т on the first half of the test and a score s — т on the second half and so repre- 
sents a cell in the bivariate distribution of the first half of the test against the 
second. The distribution P(r) is the conditional distribution obtained by 
holding total score constant. 


A similar proof holds for s > n/2, e.g., by considering the number of 
zeros instead of the number of ones. 


PG) = 


Properties of Ihe Formula (4) for Error Variance 
From (3) and (4) 
25 (sj; — 8,2)? 
гет зны N, 


To obtain the average error variance Y it 
N, , sum with respect to s, and divide by the total 


y =9 _ 


is necessary to multiply by 
number of people (N). Then 


" 1 ш τς 

Y= Να = 1) Σ Ма = s) = N Σ Σ (sn = 8)? 
"5 — о, — 8° 

ES "em = 202(1 = Tio). 

The extreme right-hand side can be 


1 shown to equal'e?(1 — jy wi "i 
correlation between the total test an. pee D S hg ag 


d a parallel test of the same length, i.e., 
pu8m@-3}) _ 4 š 
K k 


from which 


n—l1 : 


p= ñ (1 - = 9) 
no, 
This result is the Kuder-Richardson Form 
Lord [3], in his treatment of the err 
arrived at an estimate of variance due to 
n is sufficiently large so that s/n is a good e 
formula can thus be regarded as a small 
his formula. 


ula 21 [2]. 

or obtained from sampling items, 
this error equal to s(n — s)/n if 
stimate of true score, The present 
sample estimate corresponding to 


JOHN A. KEATS 33 


The formula 

z _ $ — 8) 

xr mn 01 
indicates that the error variance at a particular score level can be evaluated 
without using the reliability coefficient. This is an important result as it 
demonstrates that although the reliability coefficient varies from population 
to population for the same test, the error variance at a given score level 
remains constant. 


The Effect of Non-equivalence 


Whereas the formula developed is exact under the conditions stated, 
no exact statement can be made in general. However, an obvious adjustment 
can be made in the formula so that its average value will equal the average 
error variance for some typical group. Thus for practical purposes 


_ (1 — Ps — 8) _ , sm — 8) 
»-KE)n-10 m-l’ 


since 


P= OÈ 0 K) = σα -- №), 


Where R = reliability coefficient for some typical group, 
K = the estimate from Kuder-Richardson formula 21 for the same 
group. 

This average correction of Y cannot be expected to give good estimates 

of error variance in cases of extreme variation in item difficulty or of con- 
Siderable heterogeneity of items. Its value сап best be assessed by a com- 
parison with data such as that used by Mollenkopf [4] and [5]. The graphs 
T igures 1-13 below) indicate the degree to which adequate representation 
is obtained, For а more adequate description of the data see [4] and [5]. 
. , The points (X) indicate the mean of the squared differences for five-unit 
intervals of raw score. The solid line is the curve obtained from the theo- 
retical formula. The curved broken line (---) is the second-degree curve 
of best fit presented by Mollenkopf, and the straight broken line (—-—7 is 
the straight line obtained from the зш OR that error variance is constant 
at all points of raw Score. l'or all figures the values of the mean raw score 
(М), the standard deviation of raw scores (о), the number of items (N) and 
the corrected split-half reliability (R) are included for the data on which the 
curves are based. 

Figures 1-4 are for multiple-choice data and the corrected formula 
seems to give a reasonable representation here. Mollenkopf's des A 
“not nearly as good representation of the error trend as the best fitting secon 
degree curve" [5, p. 5]. The corrected formula gives а progressively br К 
fit as variation of item difficulty inereases, as shown by cases 5-13 whic 


го 
σι 


N 
О 


л 


Variance 


o 


Error 


ol 


20 30 40 50 60 70 80 
Test Score 
FIGURE | 


m 
ol 


no 
O 


Variance 
a 


o 


Error 


σι 


20 


Test Score 
FIGURE 2 


Score 


Test 


FIGURE 3 


ES © 
ƏSuDHDA 10443 


wo 


Score 
FIGURE 4 


Test 


Variance 


Error 


Variance 


Error 


Test Score 
FIGURE 5 


ο 10 20 30 40 5ο 60 
Test Score 


FIGURE 6 


Variance 


Error 


20 | 
М=15.3 №66 


к 6=10.3 R-.928 
215 
© 
5 
> 
[v 
5 
ш 
5 


О 10 20 30 40 50 60 
Test Score 


FIGURE 7 


Test Score 
FIGURE 8 


Error Variance 


ο ΙΟ 20 30 40 50 60 7ο 
Test Score 
FIGURE 9 


Error Voriance 


Test Score ; j 
FIGURE ΙΟ І 
1 


39 


JOHN Α. KEATS 


001 


06 


08 


OL 


11 330914 


81095 
09 


19591. 
OS 


10113 


99UDI4DA 


О 


Variance 


Error 


Error Voriance 


2 


σι 


O 


O 


Test Score 
FIGURE 12 


=321 №64 


-8.4 ΕΠΞ.9Ιὸ 


k= 


Score 
FIGURE I3 


Test 


JOHN A. KEATS 41 


are arranged in order of inereasing variation of item difficulty. The extent 
of variation of item difficulty is given in Table 6, page 218, of [4]. 

| In ease 13, in fact, the parabola has a minimum, not a maximum and 
this could not be obtained by the corrected formula. The theoretical ТОША 
tion given above could produce a minimum point by proceeding to a closer 
approximation to the general case, which leads to the formula 


а (п = 1) e = 3 D oa}, 


where var (p,) is the variance of item difficulties for persons at score s. This 
formula was obtained by analogy with the formula for the variance of the 
distribution obtained by sampling from a binomial distribution with unequal 
probabilities as given by Kendall [1, p. 122]. It is not particularly useful as 
the computation of var (p.) is tedious. Fortunately, very few tests used in 
practice have sufficient spread of item difficulty to require these computations. 
From the point of view of sampling of items this formula would be appropriate 
for stratified sampling, and the coarser approximation using k could be taken 
as an approximation to this сазе when the variance of item difficulties is small. 


Applications 
‚ — s)/(n — 1) can always be used in the absence of 


tions made by Lord [3] in his estimates of error 
if the test is considered a result of 


The expression s 
group data if the assump 
variance are considered reasonable, i.e., 
random sampling from à population of items. 

If group data are available, then k may be estimated and the formula 
used to estimate error variance in the usual sense at a particular score level. 
As k is used as a correction factor, it might be expected that it will be fairly 
stable across groups for the same test. Figures 1 and 2 refer to the same test, 
given to two groups. It will be noticed that the two theoretical curves are 


more stable than the two values of (1 = R). 


REFERENCES 
[1] Kendall, M. G. The advanced theory of statistics. Vol. I. London: Griffin, 1947. — 
[2] Kuder, G. F. and Richardson, M. W. The theory of the estimation of test, validity. 
Psychometrika, 1937, 2 151-160. ' . 
[3] Lord, F. M. Sampling fluctuations resulting from the sampling of test items. Psycho- 
metrika, 1955, 20, 1-22. 
[4] Mollenkopf, W. G. Variation 0 
1949, 14, 189-229. ë š г 
[5] Melleakopf; W. G. Error of measurement variation for multiple choice tests. Eduea- 
tional Testing Service Research Bulletin 51-1, 1951. . | 
[6] Rulon, P. J. Reliability of split-halves. Harvard Educational Review, 


Manuscript received 9/19/55 
Revised manuscript received б [7/56 


f the standard error of measurement. Psychometrika, 


1939, 9, 99-103. 


PSYCHOMETRIKA—VOL. 22, No. 1 
MARCH, 1957 


THE DETAILED METHOD OF OPTIMAL REGIONS* 
PauL 8. DWYER 


UNIVERSITY OF MICHIGAN 


The detailed method of optimal regions is an extended form of the 
method of optimal regions which has been found effective in solving the 
ersonnel classification problem when the num j 
he automatie determination of the successive v: 
by the more exact techniques of the detailed me 


for the more complex problems and pro ‹ n 
art, can be mechanized. In a sense the detailed method of optimal regions 
is more than a detailed form of the method of optimal regions. . t is essentially 
a method of transformations by which the original matrix 1s reduced to a 


matrix from which the solution is е; 


1. Introduction 


The personnel classification problem [1] deals with the assignment of 
individuals to jobs, where the contribution to the common effort of each 
individual i if he is placed in position J is the known quantity, Ci; - Two 
recommended methods of solution are the simplex method [3] and the method 
of optimal regions [2]. The reader is referred to these references for the state- 
ment of the problem, the derivation of important properties, and descriptions 
of methods of solution. 

The method of optimal regions 
the number of different positions, 
determination of a constant, Vi ; 


is especially effective when, as is common, 
k, is small. The method is based on the 
for each position. In the detailed method 
of optimal regions, more specific rules are given for determining the v, . Since 
these rules demand the calculation of auxiliary matrices, the detailed method 
is especially effective with machines, put it is also recommended when non- 


trivial problems are to be worked by hand. τε 
Let the number of individuals to be assigned to the F positions be N, and 


let c;; be entries in а matrix with Ν rows and k columns. The quota, a , the 
number of men to be assigned to position g, is exhibited in a row at t : top 
of the matrix. This matrix is illustrated in Table 1, where E ud d d 
assigned to four positions with quotas 4 1, 4, 1, | ds D 3 
is to make the assignment so that the sum of the corresponding Cii 


88 large as possible. 


*Much basic research covered it 
Tas working оар “problem of personnel classification, 4 NES ity ва OS nb 
ersonnel Research Branch, Departmen of S Army jon Cis ° p E 
αν Departmani of the Arie ie d are not to be construed as official 
Paper. The opinions expressed are those of the author and а 
or as those of the Department of the 

43 


44 PSYCHOMETRIKA 


2. "The Conditions of Solution 


The basic conditions of solution, fundamental to the simplex method and 
other methods as well as to the method of optimal regions, imply the existence 
of и; and v; [2, p. 20] such that 


(2.1) Ci; = и; + vj for assigned values, 
Ci; < U; + vj for unassigned values. 


If J; denotes the position to which individual i is assigned, the first expression 
of (2.1) is 


(2.2) Cra, = ш SF vy, 


Subtracting the second expression of (2.1) from (2.2) gives a (necessary) 
condition for solution: 


(2.3) Cisi = by, È οἱ; — vg. 


(2.3) may be called the generalized Brogden condition [2, pp. 20-21]. The 
method of optimal regions is based on the v; of (2.3). The detailed method of 
optimal regions also uses the u; of (2.2). 


3. The Determination of the Initial U; 


Given the values e;; and the quotas qi, 


method of optimal regions (and of the method of optimal regions) is the 
determination of vj", the initial values of v; . Count out the q; lays values 
in each column j and take the smallest of them. In Table 1, this process leads 
to the values, οι = 99,900) = 49, y» = 27, v(9 = 41. Then E yo -0 
for at least g; elements in column j. > di 

In problems worked by hand, it is commonly useful to indicate those 
values which are equal to or greater than the vi”. Asterisks have been used 
to indicate those values. 

The 2; may be determined with the use of 


μμ eh NAME punched cards. One card is 
punched for each individual, indicating the с; values for all positions. The 
cards are then sorted for each position and the v® determined from the sorter 


card count or from a tabulator run using cumulated frequencies 


the first step of the detailed 


4. The Determination of the Initial Assignment and the u® y, alues 


The initial assignment, J^, is then made with the use of (2.3). Thus 
in Table 1, compare Οι; — v;? for successive values of j for each i and make 
the initial assignment Jį” to that column for which C;; — vj" is largest. 
Individual 1 is initially assigned to job category 1 since —6 is grenier than 
—36, — 11, or —27. In case of a tie for the largest value of Ci; — 05, both 
values of j are recorded in the column for J”. - 


PAUL S. DWYER 45 


Wi ή 
Nose ecd eec ον ο 
Vy maaa үк g S TON, E EEN is made to the 
Eu ede appears f there are two or more asterisks in 
Hh ας | umns with asterisks need be considered in applying 
mon κών n i yep to each position is then determined. 
a nap ic ' To Dy 4i and is placed, for comparison, above the q; 
ped d z um n icates the number of ре initial assignments 
ded iui he ши ет of ties. Then form q; το, which indicates an 
E — б positive and E deficiency if negative. In Table 1 
E xcess o two assignments in column 1 and deficiencies of single 

s in columns 2 and 4. 
Next determine the uf? values. From (2.2) 


(4.1) шщ = Cu T 078» 
and then uf? = с! — vf? . The results are placed in the column labelled 
ient to determine the values of w; 


(0) 
ut TAE 
A s In practice, it is commonly conveni 
simultaneously with the values of J; - 

5. The Determination of the First Transformed Matrix 


The first transformed matrix is computed using the formula 


(5.1) οὐ) = c — πω» of. 
Every element is either zero or negative since u® and vj? are determined 
50 that ci; < u® + 009. The values οὐ) resulting from the application of 
(5.1) to the problem of Table 1 are shown in Table 2. 

The negative signs in this matrix (and the following ones 
nated by using the alternative transformation 
(5.2) ju E бин 
This is illustrated in Table 3. The v ) is then the value of the q;th 
smallest {5} in column j. The values ο e then determined using 
Em iuo — καρ τν), 
as illustrated in Table 3. The values of J®” are indicated by the zero values 
of £. The summary values qí?) and 40) are recorded in the top rows. Ex- 

ret completed since 


ds 
amination shows that the transformation process 18 not y ls 
nent in position 1. Hence an additional 


there js Ў 
ere is an excess of at least one assign! 
T: Y - . . . 
ansformation is carried out. 


) can be elimi- 


alue of v? 
f JP an 


cessive T' ransformations 


6. The Determination of Suci 
œ are available, only the values uf? are needed to 


m 
Since the values of v; 


46 PSYCHOMETRIKA 


complete the transformation. Now 


(6.1) ut? EU с о БЕ v9). 1 
and the next transformation is given by 

(2) а) 
(6.2) Uu SU eps T c ae 


The application of this transformation to the matrix of Table 3 leads to the 
matrix of Table 4. 'The symbol 0 is used for each of the zero terms appearing 
in the same row. Thus the ties of Table 3 are indicated by the 0's of Table 4. 

The values of gj? show an excess of at least 1 in column 1. Hence one 
of the men tentatively assigned to column 1 must be assigned to one of the 
other columns. This is accomplished by subtracting from column 1 the 
smallest non-zero entry in any of the rows corresponding to individuals 
tentatively assigned to position 1. In Table 4, this value is 2; so sul 9 
The remaining values of vj? are 0, but they need not be recorded since nothing 
is to be subtracted. 

The values of Jí? are then determined and the summary g‘ values. 
There are no excesses or deficiencies indicated either in the single columns 
or in the combinations of columns. The obvious assignment of ties leads to 
the set of J; values identifying the solution. 

In some cases it is necessary to make transformations on combinations 
of columns, since the method leads to a solution only when every combination 
of columns, as well as each column separately, has no deficiency [4, p. 16]. 
The technique for finding a suitable transformation when there is a de- 
ficiency in several columns differs slightly from that described above. In 
Table 3, note that an excess in column 1 indicates a deficiency in columns 
2, 3 and 4. Indeed, a summary of the J{”” column shows only five men with 
0 in columns 2, 3 or 4. But q, + q + q4 = 6. Hence there is a deficiency of 
1 in this subset. A common positive amount can be subtracted from each of 
these columns to introduce an additional term, provided the negative of 
this amount is subtracted from every row which has at least one zero term 
in columns 2, 3, 4. In this way the tentative assignments to the columns 
having a net deficiency is maintained, while adding at least one new assign- 
ment to these columns. The amount to subtract from the columns is the 
smallest (non-zero) number in those columns which is not in a row tentatively 
assigned to column 2, column 3 or column 4, In this way the transformation 
leads to a matrix having the desired property that every element is non- 

. negative. 

In Table 4, й = 2; so vs? = vf = v? — 2 with p = 0. These values 
of vf? lead to values of J {> which are identical with those of Table 4. The 
two transformations are essentially equivalent transformations since they 
lead to the same matrix. This is the 7! matrix of Table 5. Assignments 
satisfying the quotas can be made to the zero terms of this matrix. 


` 


PAUL S. DWYER 47 


The method i i 

я od is designed, a 

τ in ame | ig , at each step, to decrease the number of de- 

ο ie É particular column or combination of columns with 
reasing the number of deficiencies i ini МР, 

om але ) eficiencies in the remaining columns. The method 

em ч S. onverges since the total number of deficiencies is finite and 

ΛΙ cient донои for solution is an assignment with no deficiencies 

very NM E р. or combination of columns [4, p. 16]. The process converges 

U 1 7 H H 3 

Ha n ae the common case in which the number of job categories is 

n e E ' has led to the empirical conclusion that, for small values 

τὸς Dd ps а of transformations required for solution is approximately 

ος 4 nce the row deviates described in the next section are avail- 

, the number of transformations required is commonly less than k/2. 


7. The Use of Row Deviates 


d a ae шы is useful in speeding the convergence of the method is 
wid : row deviates. Any constant may be subtracted from any row 
hout changing the solution since (2.3) is not changed by subtracting 8 
constant from c;,; and from e;;. Subtraction of the mean of the row from 
each element in the row results in row deviates from the mean. Preferably 
one may use large row deviates defined by 


à k 
(7.1) С; = К: — δ) = ke — D cu = Κο — οι.» 
i=l 


where c; . and ë; are, respectively, the sum and mean for row boc 
The matrix of row deviates is then treated by the method described 


above. In the illustration used above the values of Uf? and V? obtained 


from the C;; matrix are almost adequate for determining the solutions. This 
adjustment is necessary in 


is shown in Table 6. Only a slight additional 

column 4. The advantage of the use of the large row deviate transformation 
may be seen from the fact that the columns of the Су; matrix are generally 
uncorrelated or slightly negati ed so that large values in one 
column are not apt to be accompanied by large values in some other column. 


The values of J; in Table 6 are identical wit 


8. Solution of a Problem in the Frequency Form 

th k = 5 and in which it is necessary 
ough (large) deviates are used. For 
h Votaw and Dailey (4, р. 7] 


An illustration is next presented wi 
columns even th 


to analyze subsets of 
oblem whic 


this purpose 8 frequency-form pr c 
have worked with the simplex method is examined. À frequency-form problem 


results from the grouping of individual categories so that frequencies (fi) 


as well as quotas (q;) appear. The number of personnel categories is n. ; 
The n = 4 values of fi; 85 well as the k = 5 values of g; , are shown in 


the first matrix of Table 7. The values of ας. are first computed and then the 


PSYCHOMETRIKA 


D 


Mama 


im 
an 


ma 


AME MS коо RO 


хулаи Teua 
S mava 


ett H 0 91 E эт εὖ 
ε £ 0 6 ю gt ot 
1 I 0 στ € zr 0 
£ ε ο eT #0 g 6 
I I 0 gz L гг +0 
π΄Ε £ ο .Ε ο le f 
τ Т ο t2 ë 
1 I ο 96 8 
ε ε ο [; Σο 
τ τ ο 6 
| „| 7 
т 1 τ 
(о) 2 | (0) : i 


Е ππανι, 


A UITM хтлаеи 


Tp pue '(z) 


хлус w 
г Slav 


8 


Ὁ [ο τ ”رم“‎ “(2) 


rf r 
A Чати XTJ42 


т TEVL 


Mme e imeem ema 


(ο 


ῃ 


) b pue “(буг “(o) n. * (uf 


(o) (o) ^ JO uoraeurTuaəqoq 
T ππανι, 


—— 
μυ. —— n 


— e. SS A 


PAUL S. DWYER 49 


values C;; are recorded i 
τ ed in the se іх ina 
vj? consider the frequencies ders д In determining the values 
since the 12 + 23 values of ΤΗ ated with each row. Thus V” = —l, 
quota of 15. The values of J {” Y rin 1 ака monetan a 
οσο, Tide ut s Se ©) are then obtained with the generalized Brogden 
ont ; 55 appare Р A : 
dividually but tha eam i p e the columnar quotas can be met in- 
dude ht =e euis $ ο... in Ὃ Tu of columns 1, 2, 5, 
š ж ауа e canno t 5 2 9 = 47 
jobs. A transformation is in order be EE U ` 
The values, U?? = 0, are i 
ει ie e , are computed and then the values TS? appearing 
2 h atrix. The values of J‘? summarize the zero ter 
ο ο io е subest va i he zero terms. The de- 
pueris se consisting of columns 1, 2, 5 can be met after the matrix 
ο... y subtraeting some quantity from each of these columns to 
,' Ὁ zeros in the columns. The quantity to be subtraeted is the smallest 
"ninm quantity in the rows not tentatively assigned to columns 1 2, or 5 
nd isl;so V® = Vf? = Vi? = land, of course, Vs? = yo Ξ 0. 
| 5 3 3 ντ. 4 E 
а ће values J í are then determined. The number of available assign- 
S s in each row is so large that assignments satisfying the frequencies 
quotas can be met in many different ways. 
ü The additional transformation indicated by the values of Vf" is made so 
D: the Τῇ matrix results. This transformation is not necessary to the 
so ution, since a solution can be obtained from the last column of the third 
matrix, but the solution may also be obtained by making assignments to the 
zero terms of the last matrix in any way 80 as to satisfy the quotas and 


frequencies. 
9. The Determination of u, and v; 
e the values of uw; and υ; of (2.1). If 


It is now possible to determin 
let 


μμ 
l; = 19 represents the final transform, 


l; = 0 for assigned values, 


t;; > 0 for unassigned values. 


(9.1) 


which the transformations are applied to the ci; 


deviates. Then 


( 
Cy -- (us? zi yf? a «== ch ut" + vf”). 


Consider first the case in 
matrix without using row 


(9B) faa e — 


1 (2) (m) 

(3) “= gt age шы а 
(0) a) (2) pr _ (m) 

b =b το 7 pr “- b; ° 


blem of Table 1 were computed using 


d row of Table 1. 
the large row deviate 


opriate to the C;; 


The values of u; and v; for the pro 
(9.3) and are shown in the last column an 
The determination of и; and v; for problems using 


transformation is more involved. If the u; and v; appt 


PSYCHOMETRIKA 


50 


шаод Kouonbadg oyy ur wo[qodq B Јо цотзпто$ 


2 ачу 


τ. 


E 
а 


= - OT 
€ £ $ 
t єт $ 
£ £ 3 
I τ 2 
tt £ ; 
g £ š 
: i š 
ε 

τ т s 


5 

ΒῚ 

5 

Е 
Š 
n 

Al 
o 

| 
a 
T 

als 
a Б 

o 


а 


m 


Aimo 
я 
ee m= IMO ~o суо 


mammam 


| P: 
= 


so3v7A49q MOY 53151 UTM uoT3nloS 
9 TIL 


PAUL S. DWYER > 51 


matrix are U; and V; , a set of non-negative values of v; can be determined 
from 


v; = (V; — Τι/ὲν 


where V;, is the smallest V; . Thus in Table 6, the values of V; are 19, 66, 
— 1, 43; so the values of v; are 5, 16 3/4, 0, 11. Again, in Table 7, the values of 
V; are —2, 3, 8, 8, 3; so the values of v; are 0, T, 2, 2, 1. Other sets of v; сап 
be obtained by adding constants. 


10. The Determination of the Assignment Sum 


'The assignment sum can be determined by applying the assignments 
for each row to the original ci; matrix. This is illustrated in Table 5; the 
values of e, y, are listed for each i, and the sum is 315 units. Απ alternative 


method is based on the formula 
(01) Eon- ee Σ αὐ”) — (Уш + Stop 
Ov ας Σ, ат"). 


πο lower right corner of the respective 
form is used, the values of > ui are 
d, the appropriate formula 


The values in parentheses are given in th 
matrices. If a problem in the frequency 
replaced by >of its ) Tf large row deviates are use 


18 


(10.2) Ns = i (276. зр. (55 το E 59 g, V9) EN (X y” 
ο EE И 


Thus in Table 6, 


© позове = 


1050 + 211 — 1 _ 815 units. 
Cili ~ 4 


11. Т nierpretalion of the M. ethod 


In a sense the detailed method of optimal regions is M His 8 UN 

i i „mer, specific rules are 81 
form of the method of optimal regions jf аса 
for determining the successive increments са. i олу йы 
of redue "1008 in which an origina! ү ed 
Тш цис the assignment can be determined: τον ος epum 
The method is especially effective, particularly when n E 3 rm 3 
in solving non-trivial personnel assignment problems with & 


Positions. 


52 PSYCHOMETRIKA 


REFERENCES 


[1] Brogden, H. E. An approach to the problem of differential prediction. Psychometrika, 
1946, 11, 139-154. 

[2] Dwyer, P. S. Solution of the personnel classification problem with the method of optimal 
regions. Psychometrika, 1954, 19, 11-26. 

[3] Votaw, D. F. Methods of solving some personnel-classification, problems. Psychometrika, 
1952, 17, 255-266. 

[4] Votaw, D. F., Jr. and Dailey, J. T. Assignment of personnel to jobs. Research Bulletin 
52-24, Air Training Command, Human Resources Research Center. Lackland Air 
Force Base, August, 1952. 


Original manuscript received 8/18/55 
Revised manuscript received 10/10/56 


n 


PSYCHOMETRIKA—VOL. 22, No. 1 
MARCH, 1957 


THE DEVELOPMENT OF 
HIERARCHICAL FACTOR SOLUTIONS* 


Joun ScHMID 


AND 


Joun M. LEIMAN 


AIR FORCE PERSONNEL AND TRAINING RESEARCH CENTER 


Although sim le structure has proved to be a valuable principle for 
rotation of axes in factor analysis, an oblique factor solution often tends to 
confound the resulting interpretation. A model is presented here which 
transforms the oblique factor solution so as to preserve simple structure and 
in addition, to provide orthogonal reference axes. Furthermore, this model 


makes explicit the hierarchical ordering of factors above the first-order 
domain. 


a procedure for transforming an 
ng a hierarchy of higher-order 
not only preserves the desired 
but also discloses the 


The purpose of this paper is to present 
oblique factor analysis solution containir 
factors into an orthogonal solution which 
interpretation characteristics of the oblique solution, 
hierarchical structuring of the variables. 

Oblique simple structure was proposed by Thurstone as a factor model 
useful for psychological research because of the simplicity with which inter- 
pretation could be made from a set of linear components underlying a set of 
scores. His argument is convineing when consideration is given to his “box 
problem” [9, pp. 140-146] for the factor loadings readily identify the dimen- 
sions of the boxes. In many studies, correlations among the reference axes 
make interpretation of simple structure difficult or questionable. In such 
cases usual methods of transformation from oblique to orthogonal axes fail 
to clarify the nature of the underlying parameters because many of the 


vanishing factor loadings become non-vanishing, thereby destroying simple 
structure. If one is w Ίο of parsimony of common 


illing to disavow the principle simony ΟΙ 
factors, one may employ the type of factor solution outlined in this paper. 
This solution not only furnishe 


s simple structure on orthogonal reference 
axes, but also provides а more complete rationale of the structuring of psycho- 
logical traits than that given by (1) à conventional oblique solution or, for 

d G. Humphreys for his encourage- 

task. Thn. investigation was 
i 1 and Training Research Center program in 
ter uu Nos. 77 asi s granted for seprodustion, ene 
publication, and use or dispos United States Government. 
53 


54 PSYCHOMETRIKA 


that matter, (ii) a solution in which the number of common factors is equal 
to the rank of the reduced correlation matrix. 

It seems reasonable to assume that psychological behavior may be 
conceived as functioning at different levels of complexity. That is, a complex 
behavior activity might be thought of as an assembly of progressively less 
complex levels of activity—each level may have semantic, psychological, or 
practical meaning. For example, Vernon [11, pp. 22-24] reports that the 
mental structure of a group of British Army and Navy recruits was examined 
with a battery of cognitive tests. As determined from the sign pattern of 
centroid factor loadings, one general factor was found to be present in all 
tests. This factor was designated as g. With the elimination of 0, the battery 
could be fractionated into two main groups of tests: academic and practical. 
In turn, the academic factor could be broken into verbal, numerical, and 
educational factors; the practical factor could be broken into mechanical, 
spatial, and physical factors, This structuring of the tests into a hierarchy 
of factors has many recommendable features—it provides information about 
the classification of tests and the behaviors measured by them in varying 
orders of concurrence and dependence. Had this particular centroid solution 
been rotated to an oblique solution, the hierarchical ordering would have 
been lost or rendered uncertain. 

Structuring of tests into a hierarchical pattern is not a new consideration. 
Holzinger’s bi-factor solution is a special case in which one second-order 
factor overlays the first-order group factors. Burt [1, 2, 3] has strongly 
advocated the hierarchical model for many years. His group factor. method, 
which yields this hierarchy, proceeds by suecessive grouping of variables 
according to their Sign pattern in a centroid solution. The procedure set 
forth in this paper, however, is an elaboration of the procedure demonstrated 
by Thompson [8, pp. 297-302] and Thurstone [9, pp. 411-439]. It differs 
from Burt’s not in the product but in the proces 
is shown to be a consequence of successively ο 
solutions. A necessary condition is the existence 
level. If oblique simple structure exists, it can b 
pattern similar in kind to that which Vernon inferred from the centroid 
solution. It will be seen that the characteristics of Simple structure are 
retained not only at the level of the first-order factors but also at all levels. 


5. The hierarchical solution 
btained higher-order factor 
of simple strueture at each 
e recast into a hierarchical 


Mathematical Rationale 


The mathematical rationale for. the model outlined in the paper is 
derived from Tucker's [10] generalization of the fundamental factor theorem 
stated by Thurstone [9, p. 78]. This theorem states that a correlation matrix, 
R, may be decomposed into correlated common factors and unique factors. 


U^ R = PoP’ + U, 


JOHN SCHMID AND JOHN M. LEIMAN 55 


where P represents the coordinates of the vector representation of the vari- 
ables on oblique Cartesian reference axes or factors, ф represents the inter- 
correlations among the oblique reference axes, and U represents the unique 
factor coefficients. It is readily seen that if ¢ is the identity matrix, the 
fundamental factor theorem of Thurstone results. 

A second theorem used in this development also stems from Tucker's 
article, He shows that if the intercorrelations among the factors, ¢, can be 
decomposed as 


@) Ф = HH’, 


then the oblique factors, P, may be transformed into orthogonal factors, F, 
according to the operation 
(3) РН = Е. 

That is, the coordinates of е variables represented as vectors may be 
transformed from oblique to orthogonal reference axes. Each row of H 
represents the direction cosines of the oblique axes with respect to the ortho- 
gonal axes developed by the decomposition. 

Guttman [4] demonstrates that if a matrix of intercorrelations, Φ, is 
factored as in (2) no matter how H is built up, the reference axes are ortho- 
gonal. The factoring or decomposition may involve any of a variety of 
procedures, such as the diagonal or square root method of factoring, the 


centroid procedure, or the method of principal axes. n 
The development of the hierarchical model utilizes these propositions. 


In the following discussion, P, will refer to the primary factor pattern of 

the ith order variables or factors; that is, the coordinates of the vector repre- 

sentation of the variables on the ith order oblique reference axes. R, will be 

used to designate the intercorrelations among the ith primary factor reference 

axes, U, will represent the unique ith order variables or factors. | 
At the outset, the initial correlation matrix, R, is decomposed according 

to (1) as follows: 

6 

In like manner, Ri is decomposed 

(5) R, = Р.Р; + Us - 


In turn, R, is decomposed 


(6) 


Each higher-level matr 
decomposed in this fashion until R; t 
implies that the ith order primary factors are or 


2 
(7) Rl BE: 


R = РР: + Ui- 


В, = Р.Р: + Ui. 

among primary factors is 
the identity matrix, which 
thogonal. That is, 


atrix of intercorrelations 
becomes 


56 PSYCHOMETRIKA 


In many cases, R; becomes a unit scalar and P, , therefore, is merely a column 
matrix. Elementary matrix manipulation permits (7) to be rewritten as a 
product of a supermatrix and its transpose: 


(8) Ria = [Р; ОДР: UN: 
Designating the supermatrix, [P; : U,], by B; , according to (3), the (i — 1)th 
order primary factors, Ῥνι , can be made orthogonal by the operation 
Pe νΒι- 

However 


(9) Ri. S P; αι GPL F δι З 


Therefore, it follows that R,_, may be rewritten as a product of à new super- 
matrix and its transpose: 

(10) К, = [P,-.B; : 0,_1]-[Р,_В, z UC 

This new supermatrix may be designated as B, , . By virtue of (2) and 
Guttman's demonstration [4], orthogonal reference axes are obtained. Fur- 
thermore, B,_, serves to rotate the primary pattern, P, , , to this orthogonal 
reference framework. Continuing this process to the lowest-order level, the 
initial primary or first-order factors, P, , are orthogonalized by the ση 
P,B, . Designate P,B, as B instead of B, since one is usually not concerned 
with explicitly appending the diagonal matrix of unique factors to the common 
factor solution. B, then, is the hierarchical solution. Since 


(11) R (with communalities) = BB’, 


B represents coordinates of the test variables on orthogonal axes 

In the development of a hierarchical solution, careful attention should 
be paid to the distinction between Simple structure and primary pattern 
This distinction has been clearly drawn and illustrated by Harris καὶ Knoell 
[5]. The hierarchical solution is contingent upon the development of a primar 
pattern at each level. This primary pattern, however, may be obtained bor 
the simple structure, which is computed either graphically or analyticall 
Once simple structure is identified, it may easily be converted ба : 1 2 
pattern [5] Бу the operation pasa 


(12) P, = VU, 
where P, is primary pattern, V; is simple structure, and (2; )i is the matrix 
of the reciprocals of the direction cosines between each primary axis and its 


own simple structure reference axis. (R;')} is obtained by taking the square 
roots of the diagonal elements only of £;'. 


Procedure 


Το demonstrate the procedure for rotating an oblique simple structure 
into a hierarchical factor solution, a correlation model was constructed from 


= 


> 


JOHN SCHMID AND JOHN M. LEIMAN 57 


TABLE 1 


Correlation Matrix, R* 


1 2 3 4 5 6 Л 8 9 3ο Al 3 
6400 7200 3136 2688 0983 0491 1290 0369 2903 1613 0645 0753 
8100 3528 302, 1106 0553 1452 0415 3266 181. 0726 0847 

1900 1200 0753 0377 0988 0282 2222 1235 0494 0576 

3600 0645 0323 0847 02,2 1905 1058 O42 Οἱ 


BEBoo-2owruevH 
8 
8 
8 
8 
Β 
5 


*Communalities appear in the principal diagonal, Decimal points have been 
omitted, 


TABLE 2 TABLE 3 


Primary Pattern, P. Intercorrelations of Primary Factors, R 
ت‎ 


£ II III iV v 


ҮІ 


— Шш чш ар ҮЛЕ 
0000  .5600 «1536 2300 0092 — 41304 
2 3 п 1.0000 1,0000 13h «2016 .3528 111% 
Δι τ 356 аз 1.0000 — 4200 «152 “0504 
4 = Ἢ 0. «2016 «200 1.0000 12268 075 
H " Y юз 3528 1x2 «3068 1.0000 н 2700 
5 4 Xo ma πὸ Ὅν από О: 
P а 
Š 2 
2 . 
10 7 
a & 
22 3 ` 
re 
TABLE 5 
TABLE 4 
S t P Correlations Among Second-Order Primary Factors, R, 
τα 
ES 1 1.000 ‚250 „5600 
1 8 ш οἱ .0000 « 
EE n Da Ao a 
3 oh à 
L «6 
5 9 


o 
. 
ο 


58 PSYCHOMETRIKA 


a postulated simple structure factor matrix. It should be emphasized, how- 
ever, that any set of empirical variables which can be rotated to simple 
structure can also be put in this more interpretable and meaningful hierarchical 
Torm. That is, if simple structure exists by any definiti 
the procedure is applicable. The given correl 
Table 1. An oblique solution was developed by t 
This oblique solution consists of a primary patt 
among the primary factors, R, . These two matrices are presented in Tables 


oduced by rotation from a 
stead of the multiple-group 
at the oblique solution is of 


The intercorrelations of the primary factors, R 
determined and placed in the diagonal elements) ar 
method. Usually it is most expeditious to carry out a common-factor analysis 
at each stage to separate the common-factor space from the unique-factor 
space. Rotation of these second-order factors is then performed to obtain 
the primary pattern of the second-order factors, P, » (Table 4) and the 
intercorrelations of the second-order primary factors, R, » (Table 5). A check 
may be made at this point since R, (with communalities) = Р.Р; . Again 
this P; may be developed by the construction of an oblique simple structure, 

V; , which is then transformed into P, by the operation indicated in (12). 

Since the second-order factors, P, , are correlated, it is obvious that a 
third-order factor exists. Consequently, R, is factored, F actoring shows that 
there is one third-order factor and three unique factors, B, (see Table 6). 
The progressive factoring of higher orders is now complete, This information 
is used for developing the preferred hierarchical factor solution. To do this, 
the operation P,B, is performed (Table 7) and the matrix of unique factors 
of R, , Us , is appended as shown in Table 8. That is, B, = [P,B, : Uz). It 
should be noted that В,В; = R, (with unities in the diagonal of `R). This 
matrix, B, , is used as the transformation matrix for rotating the first-order 


oblique solution, P, , into the final hierarchical solution, B (Table 9), according 
to the operation 


(13) i В = Р.В, . 


This procedure may be extended to higher orders if correlations are found 
among fourth-order or higher-order factors. 
It will be observed that this hierarchical solution contains 10 common 


ı , (with communalities 
e then factored by any 


, 
JOHN SCHMID AND JOHN M. LEIMAN 59 


| аа 6 TABLE 7 
Third-Order Comm: 
SEE. Quo καὶ aque: Facer M Orthogonalized Second-Order Common Factors, РВ, 
P πρ —— 
meu E 1 тї n w 
------------ m. Ee 
2 22000 : «6000 3 «2100 .3200 
3 поо : хо A em š 
S ma & 1200 ES 


TABLE 8 
Orthogonalized Second-Order Common and Unique Factors, B, 
РВ ГА 
ТЕ I III IV Y w wu ΠΠ x X 
1 „6,00 . .4800 = «6000 
2 45600 «200 Bi. лид 
+2400 «3200 : 49165 
4 «2600 120 : > 
5 .6300 6127 : «ἰ359 
6 .2100 * al t «9539 
WAE? 
Hierarchical Factor Solution, b 
Wo oo nr m v Ww Wu πὶ = — 
1 «5120 «3810 1800 
А DA «320 Я 
. «2910 
һ 23360 «2520 E 
5 1920 «2560 4032 
6 .0960 Б „3666 
7 «2520 «3360 «5600 
г 20720 «0960 m «1600 
. . «3923 
10 «3150 «3211, 25 
11 «1260 41285 +5723 
12 .1470 41499 6677 


factors, where all tests define factor I. Factors II, HII, and IV are the next 


most complex factors. Each of these in turn can be broken down into the 
finer composites illustrated by factors V through X. These last six factors 
identify the six factors of the original oblique solution, P, . It will be observed 
that this solution reproduces the communalities and the off-diagonal cor- 
relations of the original correlation matrix exactly. T'urthermore, it furnishes 
the same factorial interpretation as is found in the oblique solution, Pi , 
which is the usual type of solution obtained by researchers. Ease of psycho- 
logical interpretation has not been sacrificed by the use of the hierarchical 
solution, and what was concealed in the intercorrelations of the oblique 


60 PSYCHOMETRIKA 


axes now takes on added meaning in terms of the progressive groupings of 
the variables at higher levels. 

It should be emphasized that even though the oblique solution, P, , 
contains variables of complexity one only, this is not a restriction. Variables 
of any complexity may be used. 


Discussion 


A question arises about the stability of the hierarchical solution upon 
modification of the battery of tests. Burt concludes [3, p. 70] that the hier- 
archical solution—designated by him as the group-factor solution—remains 
"stable, if not absolutely invariant, even when the battery of tests or traits 
is modified, e.g., when a comparatively small battery is enlarged by the 
addition of more tests or more groups of tests, or when a large battery is 
curtailed by the omission of tests." The introduction of a new 
which are unrelated to any group already 
add a new group factor. 

In all probability, selection, univariate and multivariate, and sampling 
variation would affect this model in the same manner as the simple structure 
model. These points concerning battery modification, selection, and sampling 
stability need further research for clarification, 

Practical applications of this model will be greatly aided 4s more objective 
and analytical criteria and techniques for transformation to simple structure 
are achieved. Nevertheless, even with present methods of attaining simple 
structure, the hierarchical solution is useful. 


group of tests 
in the battery would, of course, 


Summary of Steps as Applied to I. llustration 


1. R, with communalities, was factored into P, and R, (T | 
я & £ a © S 2 
and 3), that is 1 (Tables 1, 2, 


R (with communalities) — ῬιΒ.Ρ! 


2. Р, , with communalities, was factored into P, and R, (Т 


that is ables 4 and 5), 


R, (with communalities) = P,h,P: , 
Rı (with unities) = PRP? + U; 
E SEN ἃ , 


where U, represents the diagonal matrix of uniq 
3. Rz, with communalities, was factor 
factor was found, i.e. R, was a unit scalar 


nique factors of №, . 
ed into P, . (Table 6). One common 


R, (with communalities) 


II 
0 
© 
Ὃ 
ως 


II 


R, (with unities) P,P; + U2. 


4. When only one common factor remains, as in this illustration, factoring 


πι. 
w a - 
23 


> 


JOHN SCHMID AND JOHN M. LEIMAN 61 


of . x á H : 
ΔῈ der matrices is completed. Otherwise, the procedure would be 
e w^ until R; becomes an identity matrix or a single highest-order 
о = ri this stage, these intermediate matrices are used for con- 
iting a rotation matrix for transforming the primary Г і 
а hierarchical solution, B. i P ομως” 
5. F orm matrix B, by appending the unique-faetor loadings of R, to 
P, ] that is i : 
В, = [Рз : Us]. (Table 6). 


It follows that 
P.P; , 


\ 


R, (with communalities) 
R, (with unities) = ВВ. 


б. Carry out the matrix operation P-B; . (Table 7). 
7. Form matrix B, by appending the unique-factor loadings of R, to 


Р.В, , that is 

B, = [P2Bs: U,]. (Table 8). 
It follows that 
R, (with communalities) = P.B;B3P3 , 


R, (with unities) = B.B: . 
the operation 


8. The hierarchical solution, B, then is constructed by 


B = P,B, . (Table 9). 


REFERENCES 
Ш Burt, C. Alternative methods of factor analysis. Brit. J. Psychol. (Statist. Sec.), 
1949, 2, 98-121. 
[2] Burt, C. Subdivided facto 
[3] Burt, C. Group factor anal 
[4] Guttman, L. General theory an 
9, 1-16. 
[5] Harris, C. W. and Knoell, D. M. The o 
Psychol., 1948, 385-403. 
[6] Holzinger, K. J. A simple me 
Holzinger, K. J. and Harman, H. H. Factor ana 
1951. 
[8] Thomson, G. H. The factorial analysis of human ability. 
Mifllin, 1948. 
[9] Thurstone, L. L. 
[10] Tucker, L. R. The role of ci 


5, 141-152. ü 
[11] Vernon, P. The structure of human abiliti 


тз. Brit. J. Psychol. (Statist. Sec-), 1949, 2, 41-63. 
lysis. Brit. J. Psychol. (Statist. Sec.), 1950, 3, 40-75. 
d methods for matric factoring. Psychometrika, 1944, 


J. educ. 


blique solution in factor analysis. 


retrika, 1944, 9, 257-261. 


thod of factor analysis. Psychon 
Chicago Press, 


lysis. Chicago: Univ. 


New York: Houghton 


Multiple-factor analysis. Chicago: Univ. Chicago Press, 1947. 
alysis. Psychometrika, 1940, 


orrelated factors in factor an 


es. New York: Wiley, 1950. 


Manuscript received 1/26/56 
Revised manuscript received 6/9/56 


PSYCHOMETRIKA—VOL. 22, ΝΟ. 1 
MARCH, 1957 


A THEORY OF PATTERN ANALYSIS FOR THE PREDICTION 
OF A QUANTITATIVE CRITERION 
ARDIE LUBIN 
> WALTER REED ARMY INSTITUTE OF RESEARCH 
AND 
Новлкт G. OsBURN 


HUMAN RESOURCES RESEARCH OFFICE, GEORGE WASHINGTON UNIVERSITY 


resented for the case of dichotomous 
his “configural scale” has maximum 
ting the configural 


á A method of pattern analysis is 
items and a quantitative criterion. 

validity in the least squares sense. A technique for compu 
scale as a polynomial function of the item scores is given. Tests of significance 
are outlined for such questions as: Is there a linear or non-linear relation 
between the quantitative criterion and the item scores? Does the addition 
of certain items to the test incrense the validity of the configural scale? Are 


all the items in the configural scale fully effective? 


1. Introduction 


to derive an optimal method of pattern 
titative criterion. Suppose each individual 
ch individual has a criterion score on à 


quantitative variable. What is the best possible way of predicting the eriterion 
score from the individual's answer pattern? How ean we make use of all 
the information given by the responses to the / items? 

A least squares method will be proposed as an adequate solution for the 
case of a quantitative criterion. This method is satisfactory if the objective 
is to minimize the sum of squ hen predicting from the subject's 


ared errors W 
answer pattern to his criterion score. 
Guttman [2] and Rao [8] have already noted th 

qualitative, the maximum likelihood solution will u 
number of misclassifications. Rao has given а general proof of this, which 
holds whether the predictors are quantitative or qualitative. The least 
squares solution presented in this paper i$ equivalent to using maximum 
likelihood when the distribution of criterion scores within the answer pattern 


is normal. 

Meehl [7] called at 
items which correlated zero 
validity of unity when the item p: 


The purpose of this paper is 
analysis for the prediction of a quan 
has taken a test of / items, and ea 


at when the criterion is 
produce the minimum 


e where two dichotomous 
us criterion would give & 
ed by what Meehl termed 


tention to à special cas 
with the dichotomo 
atterns were scor 


63 


64 PSYCHOMETRIKA 


the configural method. Meehl’s eonfigural scoring method can be derived 
from Rao's maximum likelihood approach. Horst [4] has shown that Meehl's 
configural scoring corresponds to a polynomial function of the dichotomous 
item scores. 


2. Definition of the Configural Scale 


Given a test of ¢ dichotomous items, there will be 9: possible answer 
patterns. If the mean criterion score is calculated for each answer pattern, 
there will be 2' possible criterion means. Every individual in an answer 
pattern is assigned the same score, the mean criterion score for that answer 
pattern. This set of scores is called a configural scale (by analogy with Meehl's 
scoring method). It will be shown that an individual's configural score is, 
in the least squares sense, the best prediction of his criterion score. 


3. Theorems on the Configural Scale 


THEOREM 1. The zero-order correlation of the configural scale with the 
criterion is equal to or greater than the correlation of the criterion with an y other 
sel of scores based on the answers to the t dichotomous items. 


In other words, the configural scale based on criterion means hàs/maxi- 
mum validity. The proof of this theorem follows from the least squares 
property of the mean. All individuals in an answer pattern have exactly 
the same item scores and cannot be distinguished from each other on the 
basis of the test. So they must all be assigned the same predicted criterion 
score. What score, when assigned to all individuals who have the same answer 
pattern, will produce the smallest sum of squared deviations from the observed 
criterion scores? This score is, of course, the answer-pattern mean, 

The general Pearson correlation coefficient can be defined 


(1) r= УТ (7T), 


where 7 equals the sum of squared deviations about the general criterion 
mean, and W equals the sum of squared deviations of the observed from the 
predicted criterion scores. Since W is a minimum when the configural scale 
is used, r must be a maximum. 

This general least squares technique of predicting quantitative scores 
from qualitative attributes has been known for some time. Guttman [2] 
gave essentially the same method in his section on “The prediction of à 
quantitative variate from a set of attributes." He pointed out that the 
correlation in (1) is actually η, the correlation ratio. It is also equal to the 
product-moment correlation of the configural scores with the observed 
criterion scores. 

The coefficient in (1) can be defined in analysis of variance terms. Let 
B equal the deviance (sum of squares) between the answer pattern means; 


as 


| 


zi 


ARDIE LUBIN AND HOBART G. OSBURN 65 


let W be Vis i 

be the deviance (sum of squares) within answer patterns. Then 

2 

: r= УВ/(В + W); 

this for is iv i 

bp. τ HPN εν κ to (1). In this way configural scale analysis can 

anslated into R. A. Fi s i ^ thi 

Fisher's terminology of between groups and within‏ و 

Жы To summarize, a configural scale has been defined as the set of 2! criterion 

τ ages, one for each answer pattern, and has been shown to possess maxi- 
um validity in the least squares sense. 


ΤΗΡΟΝΗΝ 2. The configuri 1 

| ЕМ 2. gural scale can be represented | [ 

ο ας represented as a polynomial 
A ? - - 

(3) C= b, + b,X, + bX. + ‘°° + b,X, + bX Xa + Х.Х + seg 


ES Diss X1 X2X; ae NNUS 
Es As an example of Theorem 2, take the two-item configural scale set 
orth in Table 1 where С, is the criterion mean of the ith answer pattern. 


TABLE 1 
A Two-item Configural Scale 


Items 
Answer Criterion 
Pattern 1 2 Average 
1 yes yes Οι 
3 yes no Cs 
3 no yes 7s 
4 no no Ci 
The polynomial predictor is 
(4) б = bo + bx, =P ЫХ» T b. X,X; , 
where X, is the score on item 1, 
X, is the score on item 2, 
and 


C is the predicted criterion score, 
bo , bı , by , and b, are the best fi 
umed that a No response is scored 
for the Yes-Yes answer pattern: 
X, = 1, X, = 1, and therefore X, X; = 1. It follows that @ = by 4b, + 
b, + b, . For the Yes-No answer pattern: X, = 1, X, = 0, and therefore 
XX, = 0. It follows that б, = bo +h < In a similar way, equations which 


tting regression coefficients. 


In this paper it is arbitrarily ass 
zero and a Yes response is scored unity. So, 


66 PSYCHOMETRIKA 


involve only the unknown b’s and the predicted Ó's сап be derived for each 
of the answer patterns. 

Note that X,X, turns out to be a dichotomous score of either zero or 
unity. In general, all possible multiplicative combinations of the item scores 
will be either zero or unity. 

There are four unknown coefficients in (4) ; there are 2° or four means 
in Table 1. Therefore, an exact solution for the unknown coefficients, such 


that Ó = C, is always possible. The four equations are 


(5) Cı = bo + bi +b + ba, 

(6) i Ὁ, = bb, 

(7) ο δι AD 

(8) σι = δα. 

The solution to this set of equations for the two-item case is 
(9) bb = Οι, 

(10) πμ ο GE 

(11) b-0-0, 

(12) bx = Cı + C, — G, — G, . 


In a similar way, if there were three items there would be 2° = g unknown 
parameters in the polynomial prediction equation 


(з) € = bo + b,X, + b,X, + b,X, + еер, 


+ baX,X, Ἔ Х.Х, + b, X,X,X, . 
Since there are 8 criterion means, 


r this would again lead to the exact solution 
of a set of 8 equations, 


unknown coefficient b, , b, , b, ere δι Της square, or higher power, of 
any item score reduces to the item score itself (0* — 0, 1* = 1), so all O 
of X are simply X, i.e., xt = X. Therefore, powers of item scores “aaa not 
— 1)/2 cross-product ter 

X,X, , XıXa , etc., since this is the number t times dvo RN RS 
selected from a set of { objects. There will be ut — 1)(t — 2)/31 cross-products 
of the type X,X,X; , X,X,X, , X,X,X, , ete. Since there is one unknown 
constant for each cross-product term, the total number of unknown coefficients 


) 
| 


i 


ARDIE LUBIN AND HOBART G. OSBURN 67 


Íor all terms is 


ил), ο DED |, SE sassa 
бй ipite h Я fe = Жүр чо? 

COROLLARY to THEOREM 2. Whenever the number of empty answer patterns 
for t items is g, then the number of terms in the polynomial equation will be 
2' — g. í 

Applying this corollary, if the number of filled answer patterns is (¢ + 1) 
or less, and the / items are linearly independent, then the linear multiple 
regression on the / items will give maximum validity. Of course there may be 
cases where the number of filled answer patterns is more than ({ + 1) and 
the linear regression still gives maximum validity. This case Will be dis- 
cussed in Section 4 in relation to tests of significance. 

For example, consider the perfect Guttman scale, which has both prop- 
erties—there are only (¢ + 1) answer patterns, and the / items are linearly 
independent. In Table 1 if the No-Yes answer pattern were empty, the two 
items would form a perfect Guttman scale. The polynomial predictor would 


become the linear equation 


(18) @ = b + bX Xs, 
Where 

(16) b, = σι, 

(17) o CO Oe) 

and 

(18) O 


These solutions for the b’s make (ο) ἘΞ C, for the ith answer pattern. a» 

The Guttman scale score is defined as the sum of the item ке Е 
X, + X, . In other words, Guttman sets.bo = 0, bi = b, q 1, in Ce Ὃ 
man [3, p. 89] states that for a perfect scale, this sum score 18 4 XR 
maximum validity. Α. multiple regression on the item scores is no Mr 
because the scale score contains all the necessary Шы. De om 23 
dictability of any outside variable from the scale scores і tl e m T. 
predictability from the multivariate distribution pe 2а 1 uda i 
zero-order correlation with the scale score 18 equivalen [πας ide 


2 EI TEMO 
relati i . Hence, scale scores provide 
elation with the universe 7 aa ΕΤΕ, 


cation of the attributes for predicting any outside variabk ον a 
This statement is correct only if the phrase “correlation ratio e I 

stituted for “zero-order correlation." There are many cases where the ЕИ 

order correlation of the scale score with the criterion is less than the multiple 


68 PSYCHOMETRIKA 


correlation of the criterion with the item scores. In order to demonstrate | 
this, examine the perfect Guttman scale shown in Table 2. 


TABLE 2 
A Perfect, Guttman Scale for Two Items 


Ttems 
Answer Criterion 
Pattern 1 2 Average 
1 yes yes Гед 
2 yes no б, 
8 no no es 


The best fitting multiple linear regression is 


(19) Û = bo + bX, + bX, ; 
where 

(20) bs = бу, I 
(21) b, = G, — G, , 

(22) b; = @ s0; 


In order for the scale score to have a validity equal to the multiple 
correlation, b, must equal b, . This imposes the restriction that 


(23) G, — G, = 6, — 6,. 


configural scale, if b, = b, = b, = = 9. = 1, and all other coefficients 
equal zero then 
(24) O LF e px. 


Similarly, the multiple regression scale is the case where only the linear 
portion of (3) is used, i.e., 


(25) C= AX SX +... фы. 
Obviously, the validity of these scales can be ranked as follows: 


configural > multiple regression > total score, 


ARDIE LUBIN AND HOBART G. OSBURN 69 


5 1 should be emphasized, of course, that this statement holds only for 
he sample being analyzed. Because of the loss of the degrees of freedom for 
the configural scale, this relation may not be found when the scale derived 
їп one sample is applied to another sample. 


4. Analysis of Variance Tests of Significance 


А E has been mentioned previously, the least. squares configural scale is 
maximum likelihood solution when the distribution of criterion scores 
within each answer pattern is normal. If further, the 2° criterion score vari- 
ances are homogeneous, then the analysis of variance technique can be used 
for tests of significance. The polynomial function used in this paper can be 
shown to be an exact algebraic transformation of Fisher's analysis of variance 
mathematical model for the case of equal answer pattern frequencies. This 
is true whenever the systematic portion of the analysis of variance model 
is equal to the cell mean. 

These tests can be used to answer such questions as: Is the validity of 
the configural scale significantly greater than zero? Is the validity of the 
total score significantly greater than that of the configural scale? Will the 
linear multiple regression give maximum validity, or are non-linear terms 
necessary? If m items are added to the test, will the configural scale validity 
increase? Are there certain terms in the polynomial predictor which do not 
contribute significantly to the validity? All these questions and other similar 
ones can be answered by the general F-ratio test. 

Yor example, suppose the question arises, “Ts the validity of the configural 


scale greater than zero?" The exact F-ratio test is 


αυ (IEE) 


with (2' — 1) over (№ — 2°) degrees of freedom. 
Some definitions are needed to make the terms in (26) clear from a 


computational point of view. Let 


1 be the number of dichotomous items, 
N be the total number of individuals 
n; be the number of subjects in the ith answer pattern, 
C;; be the criterion score for the jth individual in the ith answer pattern, 
C,. be the average criterion score for the ith answer pattern, i.e., the con- 


_ figural score for the ith answer pattern, 
C.. be the average criterion score for all № individuals. 


at all 21 answer patterns are filled. 


*In fe la (26 d elsewhere it is assumed th: 
n formula (26) and elsewhere it 1s assume ος ον (N — 9: + ο) 


If g answer patterns are empty, then the degrees of fre 
and the degrees of freedom for B equals (21 — 1 — 0). 


70 PSYCHOMETRIKA 


Then В is the between sum of squares, ie; 


2t 


(27) B= nit cid y. 
W is the within or residual sum of squares, i.e., 
ов), = 5 nD 
and 

(29) τ = B/T = р, 


Similarly one can find out if S, the total score obtained by just adding 
up the unweighted item scores, is sufficient to produce maximum validity. 
Let r, be the correlation of S with the criterion; then 


во "=P == 


is significantly larger than Z^, the Squared multiple correlation based on 
the ż items. . š 

2 eC 2 1 
(31). =т=т =® 


with (2' — 1 — 1) over (N — 2") degrees of freedom, 
Another question that can be answered is as follows: Suppose m items 


are added to a k-item test. Is the validity of the (k + m)-item БАК СБ 
scale greater than that of the k-item configura] scale? The Ё-ү; Я 


test is atio significance 
aes (y = mee Ex > Pel SAYANA арз 
x AX TSC SUAM RR Les (Fess = z J 


where W, and ri refer to statist; 
configural scale, W,.,, and r2, 
the ( + m)-item configural scal 
over (N — 2**"), 

In general, it is possible to test 
of terms in the polynomial predictor. Let Н, 


ARDIE LUBIN AND HOBART G. OSBURN 71 


a priori basis. Let H, refer to the non-null hypothesis which places no restrie- 
tions on the 2‘ parameters. Then the general formula for F is 


(83) P = (т) 
Vo — v /\Й,/” 
where vp = N minus the number of parameters used in predicting the 
criterion scores according to the Πο, 


W, = the deviance (sum of squared errors of prediction) obtained by 
applying Πο, 
л = N — 9r 
W, = the deviance obtained by applying the polynomial predictor, and 


the degrees of freedom are (vo — v) over (N — v). 


Another way of writing it is 


2 2 
(34) Р = (2 = sy vı ;) 
% — u/\l — m Ç 
where 
2 ДК = Wı 
V RS T D 
SU Tes is 
πο — T 1 


and T is the deviance about the general mean. j ] 
Equations (33) and (34) give the general solution for testing what are . 
known as "linear hypotheses," [5, pp. 298-302]. This allows the reader to 
construct his own test of significance for any question about the polynomial 
predictors. J 
5. Discussion 


Many ingenious methods of pattern and profile analysis are being used 


today in an attempt to increase the predictability of the criterion. oe ene 
Lee [1] in a partial review of the literature, αι eg a s Dn 
Presumably, one could take a set of data and compare à M s now 
methods to see which has the greatest validity. This would ea E po 
and inefficient way of solving the problem. As Horst has said [4, P. 1, e 
work in this area will be much more fruitful when more ee ni n 
mathematical concepts are developed to take the place of verba. опи ч S 
and analyses based on empirical or trial and error manipulations ο k A 

Given the case of / dichotomous items and à genii a ἂν 
the least squares approaeh shows that the configural poa 6, * d = 2 
polynomial function of the { item scores, possesses maximum y. 
the extent that any of the present techniques of pat 


tern analysis can reach 
this maximum validity, they are special cases of the polynomial Επ 


72 PSYCHOMETRIKA 


It is always possible to write a mathematical description of the con figural 
scale and apply the usual matrix algebra theorems to the result. For example, 
the polynomial predictor in matrix form is 


(35) C = Xb, 

where C is the 2' by one column vector of Observed criterion averages, 

X is a 2° by 2' matrix of zero-one entries, where the rows represent 
answer patterns and the columns represent terms of the polynomial, 

b is the 2' by one column of coefficients, 


If all the 2° answer patterns are filled, and X is non-singul 
(36) b = (XC 
is an exact solution. If some 
be made into a Square non-singular matrix by elimin 


ar, then 


are empty, X can still 
ating the corresponding 
still as above. 


s the relative importance 
of each term in the polynomi i is i 


» all coefficients of the type b;; will be 


has been hypothesized that only k of the 


"Zero, a least squares Solution ean be obtained 
Which uses only the k specified terms, 


» then only the k Specified terms are 


ive an adequate fit, then it is 
exact least squares solution. 


4 oe ΡΝ leas i is as follows: Let, X, be the 
y 5 matrix obtained by selecting the Specified k columns from X. Then 


(37) b, = (KEN SIG. 

The exact least Squares solution is as follows: Let 2, 
whose general element, t;n ; is the score of the jth in 
term of the polynomial. Then, given that th 
&nswer pattern, the jth row of Z, is exa 
Essentially Z, is an expanded form of X 
been repeated n, times. Let C be ап N-r 
criterion score of the jth individual. Th 


(38) №, = (ZIZ)"'zic 
is the set of regression coeffic 


Equation (38) provides a test on whethe 
interactions is related to the criterion. It can b 
psychological hypotheses about the relation be 


necessary to compute the 


be an N by k matrix 
dividual on the mth 
€ jth individual is in the zth 
ctly equal to the ith row of X 


owed column y У 
еп 


ients which give the exact least squares fit. 


" any specified set of item 
e à powerful tool for testing 
tween subject’s responses to 


ARDIE LUBIN AND HOBART G. OSBURN τὸ 


the items and his criterion score. T 
configural analysis. 
Another possibility is to use th 


This w + А 
hy : hs ould involve an empirical search patter 
potheses. However, considerable caution 15 neec 


applications of i i 
pplications of configural scoring 


computi regressi ici i 
puting the regression coefficients mere 


of selee i 
f selected items. In general, any 


implies a ver н 
plies a very small number of items 


1 = : š 
inless the items have been specially 


rounds for believi ç 
grounds for believing that non-linear re 


will probably give maximum validit; 
ES bh — necessary to be 
vcs es ο future samples. Because 
a sizeable decrease in the eross-v 
One procedure for guarding 
samples would be as follows: (a 
analysis sample) and test whether it 


КЫРЕ 
S significantly greater than zero, tes 


the multiple eorrelation and total sc 
are positive, the cross- 
Lord's method [6] to see if the ¢ 
(d) As a. final safeguard, the actu 
tested for significance. If the estima 
then it is unnecessary to analyze th 

The authors are grateful to P 
Тог for advice and critical co 
typing services. 


[1] Gaier, E. L. and Lee, M: i 


against 
) Compute back-v 


validities for eac 
onfigural scale h: 
al cross-validities 


rofessor 
mmen 


REFERENCES 


his could be the most useful function of 


e configural technique for item analysis. 
n with or without the use of 
led in such empirical 
since the degrees of freedom used for 
ase exponentially with the number 
procedure involving configural scoring 
and a large number of subjects. Also, 
ted, or there are good theoretical 


construc 
exist, the usual total score 


Jations 
У. 

careful in generalizing from the analysis 
of loss of degrees of freedom, there will 
validity. 
errors in generalization to future 
alidity (validity on the 
(b) If the back-validity 
t to see if it is significantly greater than 
ack-validities. (c) If the above tests 
h model should be estimated by 
as any practical advantage. 
сап be computed and 
differences fall to zero, 


differs from zero. 


ore b. 


ted cross-validity 
ss-validation sample. 

Charles F. Wrigley and Dr. Maurice 
Ruth Heitman for her- 


e cro: 


t; and to Mrs. 


The configural approach to pre- 


50, 141-149. 
953, 50, p. Horst (Ed.), Bulletin 48, 


dictive measurement. Psychol. Bull. 1 1 
[2] Guttman, L. In The prediction of personal oo 
Social Science Research Council, New York, 19%" 
[3] Guttman, L. In Measurement and prediction, Я аат es UA e 
[4] Lazarsfeld, εἰ al. Princeton, Now Jersey: T dne рл, 1954, 10, 3-11 
э ral scoring: +, 3 D AS 3 1 
ο πας s ой ΝΣ statistics. Vol. II. London: Griffin, 1946. — 
anced theory ion equation from one sample is 


[5] Kendall, M. G. The adv: 
[6] Lord, F. M. Efficiency 0 
R 


used in a new sample. 


[м 7 s ing. J. con 
eehl, P. E. Configural scoring- J pasa in prol 


[8] Rao, C. R. Utilization of multiple meas 


J. roy. statist. Soc. B, 194 


Manuscript received 1/9/06 
Revised manuscript received 3/5/56 


f prediction w 
B 50-40, Educ? 


ena regress! 
tional Testin 


sult. Psychol., 


g Service, 1950. 
1950, 14, 165-171. 
blems of piological classification. 


s, 10, 159-203. 


PSYCHOMETRIKA—VOL. 22, No. 1 
MARCH, 1957 


THE EXPECTED VARIANCE OF THE SAMPLING ERRORS FOR 
A SET OF ITEM-CRITERION CORRELATIONS 


HUBERT Е. BROGDEN 
PERSONNEL RESEARCH BRANCH 
THE ADJUTANT GENERAL’S OFFICE 


, An expression for the expected variance of the sampling errors for the 
validities of a set of correlated items that is computationally feasible when 
the number of items is large is developed. Since the item difficulties are 
assumed to be constant, the estimate must be applied to pools or sub-pools 
of items reasonably homogeneous with respect to difficulty. 


In item analysis and in а number of related problems, an expression 
giving the variance of the sampling errors of a set of item validities would 
often be useful. The standard error of the item validity coefficients is not a 
satisfactory estimate since it is well known that the sampling errors in the 
validities of a set of correlated variables are themselves correlated. Wishart [1] 
has presented a general though complex solution giving the sampling dis- 
tribution of the covariance matrix for a set of correlated variables. His 
solution is not, however, feasible and may not be appropriate when applied 
to the problems arising in dealing with the sampling variation of a set, of 
item validities. This note will propose a solution to the problem of estimating 
the variance, but not the distribution, of the sampling errors for a set of 
item validities that appears to be both simple and feasible. 1 

At least one application of such an estimate is obvious. Many investiga- 
tors, upon finding that 5 per cent of a set of items are valid at the 5 per cent 
level of confidence, conclude that their item analysis data are of no value. 
It is hoped that the sampling estimate to be presented will permit a more 


accurate conclusion in problems of this nature. 


Definition of Symbols 
T t-moment) correlation, in 
am Es sample of items and an external 
A ANE. °з Un) take on values 


‘terion. The items (σι γῶν"; t i 
ή 0 for the incorrect choice. The 


i d 
£ 1 for the correct choice an | oice. T 
СОН alternative must be determined, though arbitrarily, 


before computing the item validities. 
*The opinions expresssd are those of the author and do not reflect official Department 


of the Army policy. 


the point-biserial (or produc 
between any of a set 


75 


76 PSYCHOMETRIKA 


7; =7.,, , the point-biserial correlation between any item and the 
criterion in the universe. 
σι = с.:,УР:0:, the standard deviation of an item score in a sample. 
7; = the standard deviation of an item score in the universe. 
t = У), αι, the sum of the item scores—a score obtained by sum- 
ming the number of correct (as defined above) alternatives. 
N = the number of individuals in the sample. 


M H 


E(o%,:-7) = the expected variance of the errors in a set of item validities 


Íor a, sample. 


Assumptions 


1. It is assumed that all items have equal difficulty and, as 
that σι is constant across items. 

2. It is assumed that c; and c, are satisfactory estimates of , respectively, 
the ç, and c, . This assumption is similar to, but less restrictive than, the 
usual assumption in similar developments that the predictors remain fixed 
in going from the sample to the universe, 

3. It is assumed that { and the c 


a corollary, 


riterion are hormally distributed. 


The Derivation 


The problem is to determine, in a form feasible for 
expression for the expected variance of the errors 
in a sample. Since the errors are the discrepancies 
sample values, the expected value of the variance of 
of a set of items has the basic definition 


calculation, an 
for a set of item validities 
between the universe and 
the errors in the validities 


(1) TG Grex) = E((1/n) > (r; — τη” = [(1/n) δ (ον — *)E] 
(2) = BUM Xs — το] Eln) 2n — (1/m) ΣΠ. 


From a well-known formula, the correlation 


Сау : n of the sum of the items (/) with 
the criterion may be written, if c; is assum 


ed to be constant, 


(3) fiy = G, È ro, , 
апа, consequently, 
(4) Ῥωίσι/ σι) = Xn " 


By similar reasoning 


(5) F (6/6) = PRA . 


σα 
-π 


HUBERT Ε. ΒΒΟΘΡΕΝ 


Substituting in (2) 
E(st-:) = E{(/n) 25 (n z)] 


(6) 
3 Е[(1 /π)γωίσι/σι) ασ OFE) 


From assumption 2, ¢ equals σι and g; equals σι . Reducing further, 


(D) 0-а) = A/a) 22 Er: = 99 — AMARE = Fe". 


The expected value of the square of (r; — Τι) or of (ry — 7,,) is the average 
of their squared values over an infinite series of samples and is, consequently, 
also equal to their sampling variance. Formulas are available to evaluate 
them, and c, can be obtained from the sample. Thus, (7) provides a solution 
to the original problem, although this solution may be somewhat tedious 
since the sampling variance for each item validity must be determined. 
In most item analysis problems, the r; should be sufficiently close to zero 
so that 1/N may be regarded as à satisfactory estimate of their sampling 
variance. In this event, (7) will reduce to 


(8) Ba) = ФИ Q. — #2.) (91 /n ^02]. 


(8) gives a feasible solution to the original problem. E 

(8) will reduce further if the average item intercorrelation is zero. I 
this is true, the average of the off-diagonal entries of the full symmetric 
matrix of item covariances (whose sum equals с?) will also equal zero, and 


2 . . 
о? will reduce to no? . If n is large, 


Еб) = VN 

(9) is, of eourse, the expected result if full statistical independence of the 
items is assumed. The derivation just presented shows that a less restrictive 
assumption permits the use of this simple formula. In practice, if σι on 
not exceed no? , 1/N may be used in place of (8) to give an v qms Е 
variance of the sampling errors across 2 set of items. It should. es Een 
that the less restrictive assumption applies in estimating the a 2 E e 
sampling errors; nothing has been demonstrated regarding the distribution 
of sampling errors. : 

While the writer had primary interest in the expected маре κ T κ. 
sampling errors of the validities of a set of dichotomous items, е ? ap кч 
of (7) will apply to the validities of a set of continuous predictor e * e pn 
tion of equal item difficulty is, of course, unnecessary. 16 must be as em 
that the standard deviations of the individual predictors and the = ar 
deviation of the sum of the predictors are the same т the sample m 
verse and that / is redefined as the sum of the predictors with each predictor 


78 PSYCHOMETRIKA 


eonverted to unit standard deviation form. With these assumptions, 


(10) E(st-0) = (1/1) У EG, — τὸ” — e — т) т. 


Discussion 


To apply the formula given in (8), £ must be determined for each case 
in the sample, and σι must be computed. The Scoring run to compute / is 
the greater part of the labor involved. By the definition given to /, the sign 
of an item in such a scoring run must correspond to the sign used when the 
item validities were initially computed. 

The formula behaves as would be expected in those special cases where 
the solution is obvious. If the intercorrelations of the items are all plus one, 
the variance of the errors of a set of item validities should be zero and the 
solution by the formula yields zero variance of the errors, If the items are 
independent of each other, the errors should be independent of each other, 
as shown in (9), and the formula should and does simplify to the sampling 
variance of a correlation coefficient. 

Since the p-values of the items were assumed to be constant in deriving 
the formula, a pool of items involving a considerable range of difficulty 
will have to be subdivided into pools of constant item diffieulty before the 
formula is applied. In the author's opinion, a range of difficulty of at least 
10 can be permitted without introducing serious error, since the variation 
oÍ σι is quite small within such a p-value range. 


REFERENCE 


[1] Wishart, J. The generalized product-moment distribution in samples from a normal 
multivariate population. Biometrika, 1928, 20A, 32. 


Manuscript received б /6/56 


PSYCHOMETRIKA—vVOL. 22, NO. 1 
MARCH, 1957 


A NECESSARY AND SUFFICIENT FORMULA FOR MATRIC 
FACTORING 


Louis GUTTMAN 


CENTER FOR ADVANCED STUDY IN THE BEHAVIORAL SCIENCES" 
For the purpose of extr: i i 

1 xtracting factors from matrice: 
l that a certain formula is both necessary and su 


the орны mar be applied either to the correlation matrix, 

hp agoro ч το (assuming the communality problem is solved). As many 

factors desired can be extracted in one operation. Having suc 
Ra ra ion is useful for teaching as well as computing purposes, 
includes all techniques of factor extraction as special cases. 

| 

| 


and rank r. It is 


Let Α be an arbitrary (real) matrix of order p X q 
of rank s, where 


desired to extraet faetors from A by finding à matrix Αι 


s € r, of the form 


(1) A, = BDC, 
where D is non-singular and of order s, such that А» shall be of rank r — $ 
where 
(2) Аз А Αι. 
> s and 


, This requires further that B and C be of rank s and of orders р 

s X q, respectively. 
Such a problem oc 

closely related ways: 


(a) A may be the observed 80 


curs in factor analysis in at least two different but 


ore matrix after unique-factor 


l scores are subtraeted out, for q individuals on p tests. 
| Then B (or BD) can be regarded as common-factor loadings 
| of the tests, and DC (or C) as common-factor scores of 
Ў the respondents. | 

(b) A may be t correlation matrix with communal- 
ities in the main diagonal. In this case, p = Ф с = B 
and А, Αι, A, , and D are restrieted to being Gramian. 
Then B again gives common-factor loadings of the tests, 


while now D is the inverse of the covariance matrix of 
the common factors, being & diagonal matrix when the 


common factors are orthogonal. 


tute for Applied Soci: 
he Lucius N. Littaue 


al Research. This research was 


om the Israel Insti x 
r Foundation to the American 


*On leave fr 
facilitated in part by & grant from t 
Committee for Social Research in Israel, Inc. 

πο 


80 PSYCHOMETRIKA 


In either case, when s — 1, a single factor is extracted from A by (2), 
as by the centroid method, principal axis, or other ways of reducing r by 1+ 
When s > 1, several factors are ех{тас{ей simultaneously, as in multiple- 
group methods. When s = r, or А„ = 0, all factors are extracted in one step 
(cf. [1, 2]). The relationship between the factoring of scores and the factoring 
of correlation coefficients has been analyzed in [1, 2]. 

It has been shown in [1] that a sufficient formula for A, is as follows. 
Let X and Y be arbitrary weight matrices of orders s X p and s X q, respec- 
tively, and such that X AY' is non-singular. Let 


(8) D =. (AFI S 
Thus, D is of rank and order s. Compute B and C by the formulas 
(4) В = АУ C= XA. 


"Then if А, is computed by formula (1), 4, must be of rank s and Ας in (2) 
of rank r — s. For factoring the correlation matrix as in case (b) above, let 
yr =: X. 

This sufficient technique for extracting factors is actually only a generali- 
zation of Lagrange's technique for reducing bilinear forms, as pointed out 
in [1]. 

It is of considerable interest to inquire* as to whether any other kind 
of formula is possible for A, in (1), keeping D non-singular, but removing 
conditions (3) and (4). An important restriction in (4) is that the factor 
matrices B and C are linear transformations of A. Is it possible for factors 
to exist that are not such functions of A? 

The answer turns out to be in the negative. If A 
the rank of A to r — s, then A, in the form (1) must 
of the forms (4) and (3). Our formulas are necessary as well as sufficient. 

For the proof, suppose A, is of the form (1) and is of rank s, and D is 


non-singular of order s. Thus, B and C are of orders p X sands X q, respec- 
tively. Define the partitioned matrix E to be 


(5) ғ - [4 D ] 
€ р" 


E is A enlarged by s rows and s columns. By direct multiplication it is verified 
that 


(6) sd B ie °] 
о Ж О zi 


where J, and J, are the unit matrices of order p and 8, respectiv 
is the residual matrix defined by (2). 


1 is of rank s and reduces 
always have B, C, and D 


ely, and A, 


*This problem was suggested to the writer by Dr. W. A, Gibson. 


LOUIS GUTTMAN 81 


Let be the rank of А, . Since the first matrix on the right of (6) is clearly 
non-singular, and the rank of the second matrix is clearly s + £ (the sum of 
the ranks of A, and 10), the rank of Е must be 5 + t. Therefore, a necessary 
and sufficient condition that £ = r — s is that the rank of E equal r. But in 
the right of (5), À by itself is already of rank r, so a necessary and sufficient 
condition that E be of rank r is that the last submatrie row in the right of 
(5) be linearly dependent on the first submatrie row, or that there exist an 
X such that 


(7) @= ХА, р" = ХВ. 


Similarly, the last submatrie eolumn must be linearly dependent on the 
first submatric column, or there exists a Y such that 


(8) B-AY, D'-CY. 


d when p > rand q > r, 


"Note that X and Y need not be uniquely determine 
substituting the first part 


respectively. The first parts of (7) and (8) yield (4); 
of (7) in the last part of (8) yields (3). 

Thus, simultaneously both the necessity and sufficiency of the factoring 
formulas, in place of only the sufficiency proof in [1], have been proved. 
All possible factoring methods, whether directly on the score matrix or on 
the correlation matrix, can differ only in the choice of weight matrices X 
and Y. This fact not only gives а unified and simplified approach to practical 
computing procedures (cf. [2, 4]), but also—as Lubin has pointed out—serves 
as a simple basis for teaching factor analysis to beginning students [5]. 

It must be cautioned, however, that the above formulas assume the 
communality problem solved. The gravity of this assumption is analyzed 
in [3]. 

REFERENCES 

[1] Guttman, L. General theory and methods for matric factoring. Psychometrika, 1944, 
9, 1- 

[2] бышыш, L. Multiple group methods for common-factor analysis: their basis, com- 
putation, and interpretation. Psychometrika, 1952, 17, 209-222. c 

[3] Guttman, L. The determinacy of factor score matrices, with implications for five other 
basie problems of common-factor theory. Brit. J. statist. Psychol., 1958, 8, 65-81. "n. 

[4] Harman, H. H. The square root method and multiple group methods of factor analysis. 
P iki -55. 

[5] л. мш ч Introduetion to Factor Analysis, Personnel Psychology, 
1954, 7, 577-581. 


Manuscript. received 9/19/55 


Revised manuscript received 12/14/55 


PSYCHOMETRIKA—VOL. 22, NO. 1 
MARCH, 1957 


EXACT PROBABILITIES FOR CONTINGENCY TABLES USING 
BINOMIAL COEFFICIENTS 


James M. SAKODA 
AND 
BonroN Н. COHEN 


UNIVERSITY OF CONNECTICUT 


The use of binomial coefficients in place of factorials to shorten the 
calculation of exact probabilities for 2 x 2 and 2 X r contingency, tables is 
discussed. A useful set of inequalities for estimating the cumulative prob- 
abilities in the tail of the distribution from the probability of a single table 
with four significant places and n 


is given. A table of binomial coefficients 
through 60 is provided. 


A 9 X 2 contingency table and a numerical example are represented as 


follows: 
α b a+b i 8 15 
c d ο-- ἆ 8 37 45 
a+c b+ d N 15 45 60 


Under the hypothesis of independence the exact probability, P, of specified 
values of a, b, ο, d given the marginal totals (a + b), (ο + d), (а + ο, (ὁ + d) 
can be written either in terms of factorials or binomial coefficients [1, 2]: 
(1) _ (a+ t + да +01007 - outs, 

д al bl el di NI. NUNC 
Using binomial coefficients for our example, 


5 
6435-2156:10. _ 02608. 


P = 8319-10 
Use of binomial coefficients in the calculation of cumulative probabilities, 
P, for a given table and those more extreme than it permits the possibility 
of cumulating cross products on & desk calculator аз follows: 
аба mun k 


@) P ге 


For the numerical example 


1507 450s ΞΕ 150s mo t К „бв qe 256 
Р = pita eae απ 


60715 d 
} 5005-8145-10' 


4 
6435-2156- 1055 6435-4538-10 
5319-10 


.03234. 


83 


84 PSYCHOMETRIKA 


This is the probability for α one-tailed test based on three terms only. No 
more than four or five terms are generally necessary to obtain a fairly accurate 
probability. It can be shown that in the critical region of the distribution the 


Binomial Coefficients, „Cp " 


1 1 2 38 5 5 6 T 8 9 10 
5 1 3 6 10 15 21 28 36 45 
3 1 4 10 20 35 56 84 120 
h 1 5 15 35 το 126 210 
5 1 6 21 56 126 252 


ror| n wl Dolê Del η. 1) 1515 баб рч 17 n = 18 


о-г 
1 11 12 13 15 15 16 17 18 19 20 
2 55 66 7 91 105 120 136 153 171 190 
3 165 220 286 365 455 560 680 816 969 1150 
4 330 495 715 1001 1365 1820 2380 3060 3876 4845 
5 462 792 1287 2002 3003 4368 6188 8568 1163-1 1550-1 
6 462 92% 1716 3003 5005 8008 1238-1 1856-1 2713-1 3876-1 
1 330 792 1716 3432 6435 1144-1 1945-1 3182-1 5039-1 7752-1 
165 495 1287 3003 6435 1287-1 2531-1 4376-1 7558-1 1260-2 
9 55 220 715 2002 5005 1144-1 2531-1 5862-1 9238-1 1680-2 
10 11 66 286 1001 3003 8008 1945-1 5376-1 9238-1 1848-2 


ror; n=el ns22 п=23 QAR ра 25 ppa pa 27 о= 28 р = 29 р = 30 


1 21 22 23 24 25 26 2 28 0 

2 210 231 253 276 300 325 351 378 6 3 

3 1330 1540 1773, 2024 2300 2600 2925 3276 3654 4060 
5985 7315 8855 1063-1 1265-1 1495-1 1755-1 2048-1 2375-1 2741-1 

E 2035-1 2633-1 3365-1 4250-1 5313-1 6578-1 8073-1 9828-1 

6 

7 

8 

9 


2526-1 7461-1 1009-2 1346-2 1771-2 2302-2 2960-2 3767-2 г 8-2 
1163-2 1705-2 2452-2 3461-2 4807-2 6578-2 8880-2 1184-3 121 2036-3 
55i Ыз Ba: Dui коз N03 оз ne 
- < s. £ 2 28, T RÀ = 25 - 
10 (3527-2 6466-2 1244-3 1961-3 3269-3 2312-3 5937 3 x Ir A4 


1312-4 2003-4 3005-4 


3 
4457-3 7726-3 1304-4 2147.4 3460-4 5463-4 
3 1738-4 3042-4 Ἢ 8649-4 
2200-3 ы ys 3744-4 2786.4 1198-5 
- -4 Щ012- : > 
15 | 5426-1 1705-2 4903-2 1308-3 3269-3 7726-3 1738. d iu a AE ca E 


630 ΕΗ 25 59 80 
4495 4960 5456 5984 6545 7140 0 2 ΤΣ До 
Sd 3595-1 4092-1 4638-1 5236-1 5890-1 6605-1 7382-1 8222.1 9139-1 
1699-2 2015-2 2373-2 2783-2 3246-2 3770-2 4359-2 2019-2 5758-2 6580-2 


3 1948- 2325- Ж = 
2630-3 3366-3 4272-3 5380-3 6725-3 5340 1030-1 102 3252-3 192} à 
" 861-4 4890-4 6152-4 7690-4 
2016-4 2805-4 3857-4 5245-4 7061-4 Әз 3221. 52 769 
4435-4 6451-4 9256-4 1311-5 1836-5 2542-5 3483-5 1950-2 2119-5 2138-5 


8550-5 1203-6 1676-6 2312-6 
12 1511-5 2258-5 3548-5 5481 -5 8345-5 1252-6 1 - [ ; 
13 | 2063-5 3574-5 5732-5 9280-5 1%76-6 2311-6 3502-6 5415-6 6122-6 2201-7 
15 2652-5 4714-5 8188-5 1392-6 2320-6 3796-6 6107.6 9670-6 1508- 2321-7 
15 | 3005-5 5657-5 21037-6 1856-6 3248-6 5568-6 9364-6 1547-7 251427 4023-7 
16 3005-5 6011-5 1167-6 2204-6 4060-6 7308-6 1288- i T 
6 [3522 5952 merg esee бе [8086 19r oeat amy бозы 
18 | 2063-5 3713-5 1037-6 2204-6 1538-5 9075-6 1767-7 3358-7 6236-7 1135-8 
19 | 2422-5 374-5 8108-5 1856-6 5060-6 8597-6 1767-7 3535-7 6892-7 1313-8 
20 | 8467-8 2258-5 5732-5 1392-6 3248-6 7308-6 1591-7 3358-7 6892-7 1378-8 


error in omitting terms will 
which is utilized and equal to or larger 
which is not utilize 
in the P value, no further caleu 


10 


1 
2 
3 
m 
5 
6 


ow ON 


51 
1275 
2083-1 
2499-2 
2349-3 


1801-4 
1158-5 
6368-5 
3042-6 
1278-7 


4763-7 
1588-8 
4763-8 
1293-9 
3189-9 


7175-9 
1577-10 
2790-10 
4846-10 
7754-10 


1145-11 
1561-11 
1968-11 
2296-11 
2480-11 


2580-11 
2296-11 
1968-11 
1561-11 
1145-11 


4481-9 


1036-10 
2195-10 
4267-10 
7636-10 
1260-11 


1920-11 
2705-11 
3529-11 
4264-11 
4776-11 


4959-11 
4776-11 
4264-11 
3529-11 
2705-11 


d. If the use of another term p 
lation is necessary. 


JAMES M. SAKODA AN 


5203-11 
7805-11 
1086-12 
1403-12 
1683-12 


1877-12 
1917-22 
1877-12 
1683-12 
1403-12 


Binomial Coefficients, р 


4165-10 
g800-10 
2123-11 
4247-11 
7856-11 


1347-12 
2143-12 
3167-12 
1355-1? 
5574-12 


6 


be smaller than the probability 
than the probability 


r 


5790-10 
1396-11 


1503-13 
1403-13 


D BURTON H. COHEN 


2231-10 
2739-10 
3096-10 


2215-13 
2625-13 
2907-13 
3007-13 
2907-13 


905-10 
4970-10 
5634-10 
6321-10 
6321-10 


n = 59 


5190-12 
8964-12 
1442-13 
2163-13 
3028-13 


3960-13 
4850-13 
5532 -13 
5913-13 
5913-13 


85 


of the last table 
of the first table 
roduces negligible change 
In the example, the 


86 PSYCHOMETRIKA 


probability of the next term in the series is .00007, which changes P to .03241 
and makes the error less than .00007. i. i . 
It can also be shown that the following inequalities* exist: 


^ d ad Е 
АЕ рост) <P <o1+ چچ‎ De+)-@—-pa— 3l 


where p and P are defined in (1) and (2), and ad is taken to be the smaller 
and bc the larger of the two products. For the example, 


64 
.02608( 1 x 2) <P< .02608(1 m 2) 


03157 < P < .03263. 


In the accompanying table, binomial coefficients, „С, 
four significant digits. Values were calculated to six significant digits, rounded 
off to four, and then checked against a recent source [5]. A table of logarithms 
of binomial coefficients, available in Hald [4] ton = 100, can be substituted 
for a table of binomial coefficients. 


In the case of r X 2 tables, the probability of a specified table given the 
column sums m and n, and row sums (a + b), (c + d), (e + fj- 


r , are given to 


εν {8 

(α + b)! (c + d)! (e + f)! ча, m!n! — a+bUa +40. «мб, 
alb! аа! e! f! NAG. o 

Here also the use of binomial 

still laborious, however, since i 


tables to find those which are е 
tion [3]. . 


NVm 


coefficients is economical. The procedure is 
t is necessary to lay out all of the possible 
qually or less probable than the one in ques- 


REFERENCES 
[1] Federighi, E. The use of chi square in small samples. Amer. soc. Rev 
[2] Fisher, R. A. Statistical methods for research workers. New York: Hafner, 1954, 
[3] Freeman, G. H. and Halton, J. H. Note of an exact treatment of contingency, goodness 
of fit, and other problems of significance, Biometrika, 1951, 38, 141-149. 
[4] Hald, A. Statistical tables and formulas. New York: Wiley, 1952, 
[5] Miller, J. C. P., editor. Tables of binomial coefficie 
tables Vol. 3. Cambridge: University Press, 1954. 


+» 1950, 15, 777-779. 


nts. Royal Society mathematical 


Manuscript received 3/14/55 
Revised manuscript received 11/29/55 


*We are indebted to Professor Joh; 


у п W. Tukey for calling our 
to a similar set of inequalities, which we 


attention originall: 
modified slightly, : A 


PSYCHOMETRIKA—VOL. 22, NO. 1 
MARCH, 1957 


A STOCHASTIC MODEL FOR ROTE SERIAL LEARNING 
RICHARD C. Аткіхѕох* 


INDIANA UNIVERSITYT 


A model for the acquisition of гери in an anticipatory rote serial 
el is developed in detail for the case 
i 1 and employed to fit data where the list length 


intertrial interval is considered; some predictions are derived and checked 


inary attempt at quantitative theorizing 
in the area of rote serial learning. The model is applicable to experimental 
Situations employing the anticipation method [6] and deals with the acquisi- 
tion of correct responses, anticipatory responses, perseverative responses, 
and failures-to-respond. In addition, direct applicability of the model is 
limited to situations restricted as follows: (a) moderate presentation rate, 
(b) dissimilar intralist words, (c) familiar and easily pronounced words. The 
explanation for these restrictions is considered later. 


This paper represents à prelim 


Model 
The model makes use of the conceptual formulation of the stimulating 
Situation introduced by Estes [3] and elaborated by Estes and Burke [4]. 
The general assumptions are: (a) the effect of a stimulating situation upon 
an organism is made up of many component events; (b) when a situation is 
repeated over a series of trials, any one of these component stimulating 


events may occur on some trials and fail to occur on others. Rather than 
review the rationale of these assumptions, the reader is referred to the Estes- 

Burke paper which is helpful to an understanding of the present work. К 
Figure 1 sehematically presents the rote serial learning a 
The successive word exposures in a list of r + 1 words are indicate by 
Wi, Wa, .., Wry πε where W, is the cue for S's first anticipation on 
each run Ет, the list. В; represents ἃ. hypothesized RU. 
associated with the £ + 1st word presentation; the response of “reading 
Wi. . On the other hand, R; (i) is the response recorded by, the experimenter 
to the ith word presentation and can be either (a) а correct anticipation 
8 for advice 


"The author wishes to thank Professors C. J. Burke and W. K. Este 


and assistance in carrying out this research: 
+Now at Stanford University. 
87 


88 PSYCHOMETRIKA 


of the { + 1st word when j = i, (b) an incorrect anticipation when j = i, 
or (c) a failure-to-respond when the j subscript is omitted. (Symbols and 
their meanings are listed in Appendix B.) 


بوم 


Ws τ rd Wr Wrat 
-R (1) — R (2) — — جر‎ -R (r) —> 
: ; μη ή RA à R 


FIGURE 1 
Schematic representation of the anticipatory rote serial learning situation. 


A period h is defined as the time of a single word exposure, and a trial 
refers to one run through the list. Since the removal of one word is followed 
immediately by the presentation of the next, a trial is of time h(r + 1). The 
intertrial interval is represented as a series of k subintervals each of length h; 
thus, the intertrial interval is of time kh. When there are r + 1 words in à 
list, the list length is designated as r; this reflects the fact that the r + Ist 
word is not a cue for an anticipatory response. 

The ith word presentation is represented conceptually as a set of stimulus 
elements 8, where the sets are pairwise disjoint, and hence the intersection 
of the r + 1 sets is the null set. The number of elements in S; is N, where N 
is invariant over 7, and a parent set S* is defined such that the union of the 
T + 1 sets is a subset of S*. On a given presentation of the ith word a sample 
of elements from S, is effective; the likelihood of any element from 8, being 
in the sample is 0, where 0 < 0, < 1. (Derivations presented in this paper 
are carried out under the simplifying assumption that all elements in 8; 
are equally likely to occur on any trial.) Therefore, given the ith word pres- 
entation, a sample is drawn from S; of size No, 

Conditional relations, or connections, betw 
stimulus elements are defined as in other 
theory. The response classes Ri „Бш, +: 
define a partition of S* into subsets S£. 


een response classes and 
papers on statistieal learning 
τα R, , and Ё (failure-to-respond) 
, Sê: ‚›-+- Sf. Elements in 83, 


TSE е nse class R, ete. The concept of a 
partition implies that every element of S* must be conditioned to either 


К,,К,, +++, or R, but that no element may be conditioned to more than one. 
For each element in 8, a quantity PC; 


h iJ; n) is defined which represents the 
probability that an element from set S; is conditioned to response class R; 


element from 8, is conditioned at 
response. I 


RICHARD C. ATKINSON 89 


The anticipatory response at position 7 on trial n is assumed to be a 
funetion of the stimulus elements sampled from S; on that trial. Specifically, 
the probability of R;(i) is the ratio of the number of sampled elements 
from S, conditioned to the response class R, to the number of elements 
sampled from 8, . Since 0; is constant for all elements in 8, , the probability 
of R,(i) on trial n is the expected value of FG; jin. 

For each element sampled from 8; on trial n it is postulated that there is: 
(a) a probability А that the element is returned to S* during the h-interval 
immediately following the one in which it was sampled; (b) a probability 
М1 — А) that it it is returned to S* during the second h-interval following 
the one in which it was sampled; (c) a probability M1 — X) that it is returned 
to S* during the third h-interval following the one in which it was sampled; 
and so on. The probability that an element will be eventually returned to 
S* is unity since 


(1) yaa — N =1. 

zed 
The phrase “available at position 0 is used to refer to an element sampled 
from some set and not yet returned to S* during the h-interval in which 
W, is presented. The notion of an element being available at a position 
other than the one at which it was sampled is one way of formalizing the 


concept of trace stimuli. Parenthetically, note that the probability of an 
anticipatory response at position 7 is defined in terms of the stimulus elements 
sampled from Ө; and is not affected by elements which are available at 


position i but sampled from a stimulus set other than 8; . κ 

'The conditioned status of elements sampled from 8; upon their return 
to S* depends on the antieipatory response made at position ?. If a sample 
is drawn from 8, which elicits a correct anticipatory response, R,(), then 
all elements in the sample become conditioned to the response class R; and, 
independent of the time that an element is available, are returned to S 
conditioned to that response class. On the other hand, if the sample E 
а response, other than ἃ correct one, all elements m the πον ο δν 
being conditioned to the response class R, and there 15 à specifie pro * » y 
that the elements will be conditioned to the В; responses which occur | оге 
they are returned to S*. That is, given an incorrect anticipation or à fai ae 
to-respond, all sampled elements become conditioned to the ver E У 
R and then: (а) а proportion β of the sampled elements are conditi 


the response class R, when R: occurs, 
the elements are then returned to S* 
ini tioned 

-i in, 8 of the remaining elements are соп 
the next h-interval where, agam, B ning a) remain вз iov 


to the response class Буа when R44 occurs, = E zd 
were in the previous interval; (0) λα = A) are now r 


(1 — χ᾽ are carried on where 8 are connected to the response class j ο 


90 PSYCHOMETRIKA 


when R4,» occurs and (1 — 8) remain as they were in the previous interval; 
and so on. 

Finally, it is assumed that nothing which occurs during the intertrial 
interval will change the conditional status of the elements not yet returned 
to S* at the beginning of this interval. That is, elements returned during 
h-intervals of the intertrial interval have the same conditional status as 
elements returned in the last h-interval of the list presentation. 

More generally stated, if a sample of elements elicits a response which is 
confirmed as correct (reinforced), then each element in the sample becomes 
conditioned to that response and will remain conditioned unless the element 
is sampled at some later trial, and this new sample elicits an incorrect response. 
If a sample leads to an incorrect response, then the elements in the sample 
revert to being conditioned to the response class R and have a probability 
B of being conditioned to the response class R, associated with the R/ re- 
sponses which occur before the element is returned to S*. The conditioning 
proportion 8 can be interpreted as the probable occurrence of the implicit 
response R; to the i + 1st word presentation. This interpretation does not 
affect the quantitative formulation of the model. 

The present analysis of serial responding requires a modification of the 
notion of a sampling constant introduced in other papers on statistical 
learning theory. 0; is postulated to be a function of the number and order 
of the words that have preceded the ¿th word. Once again, consider intervals 
of time h. If the word exposure has been preceded by an infinite number of 
h-intervals which do not contain word exposures, then the sampling constant 
is 0, ; if, on the other hand, the word exposure has been preceded by an 
infinite number of h-intervals each of which contained a word exposure, 
the sampling constant is 0, . Let ο = 0, — 0, , where c > 0 and, necessarily, 
c < 1. Further, designate a decay constant η such that 0 < у < 1. Ifa 
series of successive word exposures occur, and are preceded by an infinite 
number of h-intervals which do not contain word exposures, then (a) the 
sampling constant associated with the second word exposure is 0, — ст; 
(b) the sampling constant associated with the third word is 06, — ση + 
1(1 — m)]; (c) the sampling constant for the fourth word is 6, — ст + 
n(1 — n) + n(1 — «)']; and so on. Thus, if the intertrial interval is infinite 
(i.e., each run through the list is preceded by an infinite number of h-intervals 
which do not contain word exposures), the sampling constant associated 
with set 8, on any run through the list is 


(2) 4: = 0 cl en) als 
An inspection of this equation indicates that 0, defined over list positions, 
has a maximum at position one and approaches 0 < 0, — c < 1 asi becomes 


large. 
The formulation of the sampling constant requires a uniform activity 


during intervals which do not contain word exposures; 


— 1 
нанава — - ὧν 
= E „ш. 
a sss. `. 
` a a — w ο. я Tum ας ΧΕ. 


RICHARD C. ATKINSON 91 


0, is postulated to 


be a function of the type of activity. 

The equations specified by the above assumptio: 
Consider the case in which the intertrial interval is “long,” for purposes of 
the model infinite. This case proves to be simpler than that in which the 
intertrial interval is "short" because in the infinite interval all elements 
sampled from 8, on trial n are returned to S* before the beginning of trial 
n at 1 (see equation 1). (Perseverative errors are not possible for the infinite 
intertrial interval, and their consideration is deferred until discussion of the 
short interval case.) 

Given a list length 7 and an infinite intertrial interval, the expected 
values of the probabilities of correct anticipatory responses on trial т + 1 
to the exposure of W, , Wi; and W,-2 are ` 


(8) σἴε D = @ 00065) + GCC + U — Ce; τ)]θ), 


ns сап now be written. 


Cr—1;nT0D- (1 = 6-2€(r — 1;n) 


(4) 
alc — υπ) +U- Ce = 1;9]D6 + (1 = yea — 8n. 
(5) Cir — 2;n + 1 = (l, = OE G= 2;n) + 8, «σι — 25) 
8) + (1 = NBU- 8)). 


+ [1 — Ce — 2; 9 + A — Xe — 


More generally, 


(8 C@jn+t)=0- 0)CG; τὸ + 6405 + р — CG; 8A], 


where 
1-11 = 3a = β)] 


() A; = À = GU ερ) 


indicates that A, defined o 
unity. The function assume 
comes large to а maximum value of 


difference equation (6) is 
UL ПЕ CG; 01 — 6,8A;] 


+ (-0- Br. 


ver list positions, is bounded 
s a minimum at position one 


Inspection of (7) 
unity at position r. 


between zero and 
and increases as û be 
'The solution of 


(8) C(ijm = 
(cf. [5]). 


Similar sets ο 
ability of an anticip 
analysis is limited here to C(/; n). А 1 

For the typical rote serial learning situation, assume CG; 0) d 0; that is, 

rect anticipations. The 


on the first run through the list S will make no corre 
probability of an error on trial n at position ? js 1 — σ(ὗ m)), and the number 


f equations (see Appendix A) can be written for the prob- 
atory error and failure-to-respond. However, for simplicity, 


92 PSYCHOMETRIKA 


of errors at position š during the first z + 1 trials is 


= " 1-1 — 948A UU 
(9) L Ë — CG;n)] = zm 
As x becomes large this expression approaches 
(10) 1/(0;BA,). 


Application to Data 


Data have been collected for different list 1 
intertrial interval [1]. The lists were com 
nounced two-syllable adjectives; no two w 
or phonetic construction. The data on total 
16 trials at each list position are presented ir 


engths with a one-minute 
posed of familiar and easily pro- 
ords possessed similar meaning 
number of errors over the first 
n Figure 2. Each curve is based 


10 ΧΥΞΙΘ' 
Or 135 OBSERVED 
ФУ: 8, 
gl — THEORETICAL 9 
N 
tr 
o 
c8 
a= 
u 
u. 
97 
a 
ul 
= 
56 
z 
z 
5 
ж 


2 3 4 5 6 7 8 9 о 
LIST POSITION 


FIGURE 2 


12 13 14 15 16 17 18 


Theoretical and observed values of mean numl 


ber of errors by 
first 16 trials for lists of length 8, 13, and 18. 


serial positions over the 


RICHARD C. ATKINSON 93 


ma я оѓ 42 Ss obtained in a situation employing a latin square 
и ваме lence оп intertrial interval [1] suggests that the one-minute 
[урке ыу “μασ the theoretical infinite intertrial interval. 
ον ο... @) and (10) are applicable. 'These equations were 
ced ο provide a visual fit to data for the list in which r equals 18; the 

ips parameter values were А = .41, 8 = 55,0; = 1:00, c = 64, and 
Eun E These values were substituted in equations (2) and (10) to yield 
е curves for r equal to 8 and 13. An inspection of Figure 2 indicates 

greement between predicted and observed values. 


Discussion 


& In the introduction the class of rote serial learning experiments to which 
he model is presumed to apply was delimited. The reasons for these restric- 
tions are: 

(a) Moderate presentation rate. A pres 
Would tend to decrease the likelihood of overt verbal responses and lead to 


an inerease in the number of failures-to-respond. Consequently the model 
When applied to conditions of rapid presentation would underestimate the 
he other hand, the model assumes 


Observed number of failures-to-respond. Ont 
that a single sample is drawn from S, during the W; exposure, an assumption 
Which is to depend on a short exposure period. Experimentally these diffi- 
culties can be resolved by a short word exposure period followed by a blank 
€xposure during which 5 provides an anticipation Or failure-to-respond. An 
extension of the model to the case of a rapid rate has been examined, but 
the equations will not be displayed here. + 
(b) Highly dissimilar words. It is required in the model that the 8; 
Sets be pairwise disjoint. This simplifying assumption is suspect for any 
Serial learning situation, but it appears to provide an adequate approximation 
in this restricted situation. For the case of highly similar list words a set of 
elements common to each 8; would be introduced; the additional problems 
generated in this case are not considered here. . w 
(c) Familiar and easily pronounced words. For the model, this restriction 
tefers to a state such that the occurrence of the hypothesized ιν 
relation is invariant over trials. For nonsense syllable learning the model 
Would require, as an additional feature, describing the acquisition 


α function 

Over trials of the W;—Ri-1 connection [7]. | 
In analyzing thé model, the case where the intertrial interval is long 
has been considered. With a short interval the equations become more 
Complex. Now some elements sampled on trial n remain available throughout 
the intertrial interval and into the next run through the =e For be qe 
assume tj i d from S,-ı on trial n an not returned to 
hat an element is sample 1 ee as se 


* for five h-intervals; the probability of this even 


entation rate that is too rapid 


94 PSYCHOMETRIKA 


k = 1, the element will be returned after the occurrence of Ri on trial n 1. 
Consequently, there is a probability 8[1 — C(r — 1; m)] that this element is 
conditioned to the response class R, . The element, when sampled again, 
increases the likelihood of an R, anticipatory response which, at position 
r — 1, would be classified as a perseverative error. It follows that the shorter 
the intertrial interval the greater the number of perseverative errors. This 
result has been experimentally verified [1]. 


Appendix A 


Probability of a Failure-to-Respond and an Anticipatory Error 


For the case of an infinite intertrial interval the probability of a failure- 
to-respond at position £ on trial n + 1 is 


Q2  RG;n--1-0- 6)RG;»)-- 0.1 — Cas] — BA; . 
The solution [5, p. 584] of this difference equation is 
(12) Ε(;π) = (1 — 0)"RG; 0) + oa B4: [1 — 0,64,)" — (1 — 02"), 


where R(i; 0) is the probability of a failure-to-respond on the initial run 
through the list. The probability of an anticipatory error is 


(13) A(;n) = 1 — C(i;n) — Rn). 


For the typical experimental situation, assume C(i;0) = 0 and RG; 0) = 1; 
then (13) reduces to 


(14) AG; = р ü — 0.889" — (1 — 6077. 


(12) and (14) when summed over the first z trials, as was done in (9) 
for incorrect responsés, produce functions for failures-to-respond and anticipa- 
tory errors of the form reported by Deese and Kresse [2]. 

Appendix B 
List of Symbols and Their М. eanings 


AG; п) probability of an anticipatory error at position i on trial n. 


B conditioning constant associated with an incorrect anticipation. 
с θι — 6. 

C(i;n) probability of а correct anticipation at position ¿ on trial n. 

Δι function defined over 7; dependent on т, À, and β. 

η decay constant related to the decrement in 0; as z increases. 

h time of a single word exposure. 


k number of h-intervals in the intertrial interval. 


RICHARD C. ATKINSON 95 


À probability that an available element will be returned to S* during 
the next h-interval. 

n number of trial. 

T list length. 

Ri hypothesized covert response; reading W; - 

К, response class; overt anticipation of W; - 

R response class; failure-to-respond. 


RG) R, recorded by experimenter to Wi. 
B(i; n) probability of a failure-to-respond at position û on trial n. 


8* set of stimulus elements of which all S; are subsets. 

8, set of stimulus elements associated with W; . 

9; probability of sampling an element from 8, when W; occurs. 

W: ith word presentation, where W, is cue for first anticipation. 
REFERENCES 


[1] Atkinson, R. C. An analysis of rote serial position effects in terms of 8 statistical model. 


Unpublished doctor's dissertation, Indiana Univ., 1954. . ( À 

[2] Deese, J. and Kresse, F. H. An experimental analysis of the errors In rote serial learning. 
J. exp. Psychol., 1952, 44, 199-202. 

[3] Estes, W. K. Toward a statistical theory of learning. Psychol. Rev., 1950, 57, M 

[4] Estes, W. K. and Burke, C. J. A theory of stimulus variability. Psychol. Rev., 1959, 
60, 276-286. ; 

[5] Jordan, C. Calculus of finite differences. New York: Chelsea, рш NR. 

[6] MeGeoch, J. A. and Irion, A. L. The psychology of human learning. d 
Longmans, Green, 1952. š E 

[7] Noble, C. Е, The effect of familiarization upon serial verbal learning. J. exp. Psychol., 
1955, 49, 333-337. 


Manuscript received 1/12/56 
Revised manuscript received 2/22/56 


А 


PSYCHOMETRIKA—VOL. 22, NO. 1 
MARCH, 1957 


A COMPUTATIONAL PROCEDURE FOR TAU CORRELATION* 


Dersmonp S. CARTWRIGHT 


UNIVERSITY OF CHICAGO 


The (au coefficient is defined, and a computational procedure for tied 
ranks is described. The procedure maintains continuous computational 
checks, saves labor, and particularly facilitates the use of tau with large 
samples. It is also shown how tau correlation may be applied to Q-sorts with 
any shape of forced distribution or with unforced distributions. 


It frequently happens in psychological research that the only appropriate 
test for an hypothesis is one that does not assume normality for the variates 
concerned. Where correlation is at issue, à non-paramatric coefficient is 
required. Spearman’s rho is well known. Kendall’s tau [8], a newer method of 
rank correlation, has several advantages over rho. Most important is the fact 
that the significance of a sample tau can be accurately evaluated on the basis 
of the normal probability integral for n > 10, while for n < 10 exact tables are 
available. The chief disadvantage of tau is the computational labor involved. 
This paper presents à method of computation designed especially for large 


samples and multiple ties. 


Definition 


ı — 1)/2 relations between pairs. If 


Among n individuals there are n с I 
each pair can agree or differ as to 


the individuals be ranked on two variates, 
the order of ranks. Tau is defined as 


Pegi 
8 T= n(n = 1/2 ` 


Where т is the number of individuals, P is the number of pairs having the same 
rank order on both variates, and Q is the number of pairs having inverse 


orders. If the two rankings agree perfectly, then P = n(n — 1)/2, Q = 9, 
and + = 4-1.00. If one ranking is à perfect inversion of the other, then Q — 


n(n — 1/2, P = 0, and r = —1.00. 


i i i i h at the Counseling 

"The dure d bed was developed in connection with researc 

Center, να af Chicago. The research is supported by a grant (PHB M 903) под 

the National Institute of Mental Health, of the National Institutes of Health, + u ic 
ealth Service. 


97 


98 › PSYCHOMETRIKA 


Alternate Formulas 


Kendall has given formulas equivalent to (1) which reduce computation 
labor. For example, 


= OQ. 
(9) DEW n(n — 1)/2 
The transformation from (1) to (2) is 
CRO E OS p 20 
@ Tan —D/2  nn—01/2 — n(n — 1)/2’ 


which holds when P + Q = n(n — 1)/2. This is true if, and only if, there 
are no ties. Tied pairs can generate neither an agreement nor an inversion. 
Tied pairs can generate only zero scores. Let the sum of these be Z. In the 
tied case n(n — 1)/2 = P + Q + Z. Hence the transformation (3) is im- 
possible. It is for this reason that Bright’s recent procedure for computing 
7 [1] is inappropriate for the tied case. Only formula (1) is appropriate here. 


Procedure 


If only one ranking contains ties, arrange the paired ranks as nearly 88 
possible in natural order from left to right on the tied variate. Bracket each 
tied set. An example follows, with R, the tied variate: 


Πιο 1 B 3 4 5 | 7 d 
ОО o Hele. 11” 150718 


P=54+3+3+1+41+40+4+0+0=18 
Q=2+1+1+3+2+0+0+0= 9 
2=0+2+1+0+0+2+1+0 = 6 

Check 5 =P+Q4+2Z=74+6454443424+1+40=28 


Certain rules of procedure may be set out as follows: 


(1) There is an agreement, at any given number in R, , for every 
larger number to the right of it which is nof in the same 
bracket. 

(2) There is an inversion, at any given number in R, , for every 
smaller number to the right of it which is not in the same 
bracket. 

(3) There is a zero, at any given bracketed number in R, , for 
every number to the right of it in the same bracket, 


Every given pair of individuals generates either an agreement (larger 
number to the right), an inversion (smaller number to the right), or a zero 


DESMOND S. CARTWRIGHT 99 


(in the same bracket). For every ith individual, proceeding from left to right. 
there vill ben — i to the right, each of which must generate an agreament 
омат inversion or а zero. А check row is affixed, with entries n — i; for each 
individual, (agreements + inversions + zeros) = ^ — i. Also for the totals, 
P+Q4Z=nn—D/2= У - 9. 

If there are ties in both rankings, arrange the paired ranks as nearly as 
possible in natural order from left to right on one tied variate R, . Bracket 
each set tied on R, . Add one more rule to the three given above: 


(4) There is a zero, at any given number in R, , for every equal 
number to the right of it which is not in the same bracket. 


П МЕКО E 5.5 5 MER dE. 
2 6 8 8 81131112 21111 


R,: + 9 
Z.= 0 Ὁ 48 от Ὁ; 0 12' 1 1000 
Z=0 ο 2 О0о Ко 012 30 ОКШ Ὁ 


P= 9+ 8+ 4+4+4+4+4+0+0+2+2+0+0=41 
Q= 3+ 3+ 0--2-4-2-2--2--5-- 0--0--0--0--0— 19 


Z= ZZ = Ut 643+2+1+0+0+4+1+0+1+0=18 


Check SS =P+Q+Z= mmn sn 151 ЫЫ 
Tor clarity the zero scores arising from rules (3) and (4) are separated 
in tabulation. Zeros arising from (3) are entered in the row Z, . Zeros arising 
from (4) are entered in row Z, . The row Z is given by Z, + Zi 
Our second example has n = 13. With 78 relations between pairs to be 
t-cut is provided by the 


examined, computation is already laborious. A shor 


following method. is pi 
After setting out R, and R, as before, R is given а separate лшн 
form, This is called а «B.chart". The stub-head contains all ranks e 
mid-ranks of R, written in natural order. For any set of ties brie a, mm 
is repres . The top row of the table proper shows the number οἱ nive" 

presented. The top 7 then filled in with entries 


vals in each set. Let that number be x. Columns are 

decreasing successively by 1 from v until unity 18 reached. The complete. 
B-chart for our second example appears like this: 

Rank or mid-rank AU 4 5 6 SEULS 


Number of individuals: 1 1 1 1 


= t° ο 
= 19 до 


9 
9 
1 


100 PSYCHOMETRIKA 


The B-chart gives an orderly representation of R, . The sum of numbers 
in the top row is equal to n. Each entry in the top row gives the number of 
individuals having the rank or mid-rank shown in the stub-head. Thus there 
are three individuals with mid-rank of 8. The chart shows at a glance how 
many individuals have a rank number larger than a given number. Thus the 
sum of numbers in the top row to the right of column 8 is equal to 4. It shows 
how many individuals have a rank number smaller than a given number. Thus 
the sum of numbers in the top row to the left of column 8 is equal to 6. 

Computation by the rules listed above proceeds from left to right. 
Before applying the computation rules to any given number, we strike out its 
appropriate entry in the B-chart. Thus, for any given number in R, , the chart 
will show how many numbers fo the right of that number in R, are larger than, 
smaller than, or equal to the given number. Thus rapid computation for rules 
(1), (2) and (4) is possible. Rules for using the B-chart follow: 


(5) For any given non-bracketed number in R, , enter the B-chart 
under the column-head with that number, and strike out; 
the topmost visible figure in the column. The sum of topmost 
visible figures (i.e., not struck out) in all columns to the 
right of that entered gives the number of agreements for 
rule (1). The sum of topmost visible figures in all columns to 
the left of that entered gives the number of inversions for 
rule (2). In the column entered, the topmost visible figure 
(after the strike-out) gives the number of zeros for Tule (4). 

(6) Before computing by rules (1), (2) and (4) for any given 
bracketed number in R, , enter the B-chart under the column- 
head with that number and strike out the topmost visible 
figure in the column. Repeat for each number in the bracket 
until all appropriate strike-outs are made. Computations 
then proceed as in rule (5). 


Use of rules (5) and (6) will be illustrated with the second example. The 
first number in R, is 4. By rule (5) the entry 1 in column 4 of the B-chart is 
struck out. Topmost visible figures in columns to the right of 4 are fee; 
giving 9 agreements by rule (1). Summing to the left gives 3 inversions by 
rule (2). There are no zeros by rule (4). The Second number in R, is 5. By 
rule (5) the entry 1 in column 5 is struck out. Summing to the right sive 8 
agreements by rule (1). Summing to the left gives 3 inversions by rule (2) 
There are no zeros by rule (4). The third number in R, is 2, and it is bracketed, 
By rule (6) the entry 3 in column 2 of the B- | 


chart is struek out. Other num- 
bers in the same bracket are 6, 8, 8 and 8. The entry 1 in column 6, and the 
entries 3, 2 and 1 in column 8 are all struck out. Just before computing for 


the third number in R, , the B-chart would look like this: 


DESMOND S. CARTWRIGHT m 


Rank or mid-rank : 2 4 5 6 8 ‚ШШ 18 


Number of individuals: — - = = = 3 1 
2 - 2 
1 _ 1 


For the third number in R, , which is 2, summing to the right gives 4 agree- 
ments by rule (1); there are no inversions by rule (2); there are two zeros 
by rule (4), as given by the topmost visible figure in column 2 itself. 

Rule (3) does not require use of the B-chart. In a bracketed set of { 
numbers in R, , the ith number generates ¢ — i zeros by rule (3). This number 
of zeros is entered directly in row Z, . 


Formulas for Ties 


The method described above yields Р and Q, components of the numerator 
for т. Kendall gives two alternative denominators for use with ties. The first, 
7, , implies an untied criterion ranking. Ties indicate departures from this 
criterion even as inversions do. Hence the denominator remains n(n - 0/2. 
The second, τι , does not imply à criterion. The presence of ties simply 
reduces the maximum possible number of agreements. The denominator is 
accordingly reduced and made equal to the geometric mean of Pax for R, 
and P,,. for R, . For any set of n untied ranks, Pros = γῶν — 1)/2, and 
Quain = 0. If there are ties (μια is still 0, but every pair of numbers within a 
tie generates a zero, and thereby reduces Pmax - Within any tie, τη uu 
of such zeros is Ё — 1)/2. Hence for Ra , Pass = ας DR ү Е 
where for each tie, / is the number of ranks tied, and 2, means M n 
over all seís of ties in №, . If we label the ties in Ry , и, then Pru for ™s 


n(n — 1)/2 — Dou lulu — 1)/2]. Then, 


РО , 
(5 ^7 Anm — 1/2] — Ту — 2/2] — U 


where 


U = Ð ulu — 0/2. 


2) = 13; U = (8-2/2 
For our second example, T = (5-4/2) + (3-2/2) = 13; U = (3-2/2) + 
(3.2/2) + (3-2/2) = 9; and 
41 — 19 = 329 


аа 
ъ= Jas — 13) V G8 — 9) 


102 PSYCHOMETRIKA 


Application of + to Large Samples 


The procedure described in this paper may be used with large samples of 
any kind of appropriate data. However, illustrations will here be confined to 
investigations using subjective metrics. 

Butler and Fiske [2] have recently argued for increased use of subjective 
metrics in personality assessment. They point out [2, p. 332] that a card-sort, 
as used in Stephenson's Q-technique [4], can be regarded as a ranking with à 
fixed number of ties. They assert that 7 correlation is an appropriate method 
of analysis for such sorts. 

When cards are forced into a normal distribution by the experimental 
instructions it seems immaterial whether the product moment or the + 
correlation procedure is employed. In some investigations however, it is 
desired to use a forced rectangular distribution, or some other forced shape 
of distribution which may or may not satisfy the requirements for using a 
product moment coefficient. In other investigations maximum reduction of 
experimental constraint may be required, so that the subject is left free to 
distribute the cards in any way he pleases. Under such conditions the τ 
coefficient provides an appropriate test for hypotheses concerned with 
correlation. 

In Table 1 a B-chart is presented for the case of forced-normal sorts of 64 
cards. It will be noted that the column heads are pile numbers instead of ranks 
The subject is required to sort the cards into 7 piles with the given distribution. 
The metric provided by the instruction may be “from most to least signifi- 
cant for you.” The basic order relations between cards are therefore given by 
the pile numbers, and it is unnecessary to transform these to ranks ў 

Suppose P, and R, on the 64 cards are set out as for our second example 
Suppose the first number in R, is 4. Then, following rule (5) cd iu 
15 + 6 + 1 = 22 agreements by rule (1); there are 15 +64 1 = BS V 
versions by rule (2); and there are 19 zeros by rule (4). Instead of € 
n — l — 63 observations of the relations between the first member ‘nd i 
other members, only three readings are made from the chart in Table Ч R 

Where work is being done with forced-sorts, one blank chart of ih ki d 
shown in Table 1 can be mimeographed for all correlations on dois niega n 
given distribution. When the investigation is concerned with distrib i de 3 
sortings, individual eharts must be prepared. For extensive PTS. of i at rey 
however, it is possible to set up a generalized chart with rows r = ο 
mum likely number of cards sorted into any one pile), and columns c = τω 
maximum likely number of piles). Every column is then filled with η. 
descending from r successively by 1. For a particular correlation the B-chart 
is then drawn in ink. This procedure is exemplified in "Table 2 | 

'Тһе distribution outlined in Table 2 gives the B-chart for Журо ЫНС 
sorting of 26 colors in terms of preference. Using this chart, and machine 


5 108 


DESMOND S. CARTWRIGHT 


cZ OTAVI UP UMONP VY TETUER ET uua RE κ M 


m mta A A E gg 


سے 


πε n-z/(-uwA α- г/(т-ч)ч\ T 
σπα e i 
E 
ga = г / (tenn zen SC - z 
A 9 - z 
απ -ε/ Qi PLS 


ο τσ зет δις 99 «ius é 2 Ote TL * ёт +°єт TL + SU + gU + LT + BE 4 6U + 02 e 126 22 + Coe МЕ + Se 1 


OTITIS G όνος ο ο ert 42 ος mT eS) he στ ve 0 +6 ος ο να i0 σσ νε νεα 


о «O0 +0 +0 «τ +0 +0 +0 +2 +0 40 +0 +0 +0 +2 +0 +0 -2 +2 +91 + (0+0 +0 +0 +2 N Ὁ 


0.40 «6 «0 +t c£ етее) + ü. «89 #8 +6 «Te тсе Сте fC T +9T 22 + 22 + 61+ 61 d 


ο 0 ο ο г τ ο ο г 0 ο ο ο ο € о ο [4 г τ € £ ο о τ z “1 
ον ος “от erm ea t τὸ πο ο Че Ἡ US ποὺ "Dew o ж ο S o EUN τος 
8 9 L 6 8 by 949 L Sa t " wm $ € e " " 6 8 # WC. © є T 
E É ο oe) ο sol w ws. “κα um A qae 6 E "e ελ ον ιο OP ATTEN 
«suoi 92 Jo %105 poo10Jun 
че uo nep лоу ouj4nou leuol4e4ndzo) Jo οτόννχα 
€ элемі 
KO. ας ος in —— -[T- A-.nJ 
τ τ AE m τι Xx uu t π 1 1 5 
ё ae 4€ z 2 € m 2 z z z η 
€ € 6 € € € ε £ £ 5 
т R " " " " M ki 1 т - t 9 τ 
5 5 5 s s 5 s 5 а 2 1 РА 
9 ο 1S 9 ο 5 9 9 . οι 9 9 ç 8 ç 
1 L 1 Lb JN τα OB L 1 L 1 1 " E x 
9 ° 8 9 8 9 в 8 8 8 8 8 ς ot А 
6 59. б 6 6. 367 6 «Ὁ 6 6 6 6 9 κ ә 
ο ot ot т Ot OT OT ο ο ο ο ο 1 a ñ 
στα ww πο τε т τε x w FE 8 a 8 
vt ο хаа Uo аи WU ασ ο а é т 6 
eee τ ot St от т 
z п 9t ττ z 
FoU 5 é y ho w O т ο Z Ὢ ε zt lT zt £ 
" ετ 9T € 1 
sioqmy οττὰ ç qt H E 1 
t 9 9 oz «τ 9 τ 
4205 IInd v 20у 4749-4 ο1)τοοάς v Βιπμοις 
сшәзт 92 Jo 51405 Ῥοοποτύῃ Ча sIupyuvy g Jo] 324) TLIU 
£ 9 5 " ε z τ 
2 save Бәсит Tid a 
sma] Пу ЈО 140$ ТесІоң-рәолоң E JOS 3182-9 
А т этни 
ο . 


e ἃ ыст ы μμ Χα 


104 PSYCHOMETRIKA 


summation across the rows of P, Q, and Z, the computation of 7 for two 


unforced sortings of 26 items can be done in less than 10 minutes. The entire 
routine is illustrated in Table 3. 


REFERENCES 


[1] Bright, H. F. A method for com 
Measmt., 1954, 14, 700-704. 


[2] Butler, J. M. and Fiske, D. W. "Theory and techniques of assessment. In C. P. Stone 


(Ed.), Annual Review of Psychology. Stanford: Annual Reviews Inc., 1955, Pp. 327-356. 
[3] Kendall, M. G. Rank correlation methods. London: Griffin, 


[4] Stephenson, W. The study of behavior: Q-technique and i 


ts methodology. Chicago: 
Univ. Press, 1953. 


Manuscript received 6/20/55 
Revised manuscript received 1 /23/56 


puting the Kendall Tau Coefficient. Educ. psychol. 


— 


PSYCHOMETRIKA—VOL. 22, No. 1 
MARCH, 1957 


ITERATIVE INVERSE FACTOR ANALYSIS—A RAPID METHOD 
FOR CLUSTERING PERSONS 


BERNARD M. Bass 


LOUISIANA STATE UNIVERSITY 


By interchanging persons and items, iterative inverse factor analysis 
provides a relatively inexpensive way of clustering persons according to 
their patterns of response to the items. In addition to permitting the cluster- 

of persons, the technique enables one to determine 


ing of large numbers of , 
the bases for such clustering. The items of behavior used ean be heterogeneous 


in content and form. 


Wherry and his associates have described iterative procedures for factor- 
analyzing large numbers of test items [1, 2, 4]. They suggest that the iterative 
approach would yield the same factor structure as the traditional multiple- 
centroid procedure. Moreover, the iterative approach would provide the 
factor loadings of each item rather than merely the loadings of each total 


test score. 
The original development involves the following procedures for a test 
of dichotomously scored items: 
1. Obtain a total score X, on all items for each subject. | 
2. Obtain the tetrachorie correlation coefficient between each item and 


the total score. . | | 
3. Select those items to form pool 1 which correlate highest with the 


Score X,. : . 
4. Obtain a total score X» using all items less those items in pool 1. 
btain pool 2. 


5. Repeat steps 2 and 3 to o : | 
6. Iterate until all communality among items has been accounted for 


but the obtained oblique factor 


by the pool scores. 
if desired, to an orthogonal 


7. 'The pool scores lack independence; 
matrix can be rotated to simple structure, and, 
solution. 
that all the advantages of the iterative 
factor analysis by interchanging subjects 
nalysis provides a means for clustering 
ns. The behavior assessed can be 
e and the scope of behavior 
le increase in analysis time. 


The present article proposes 
technique can be applied to inverse 
and items. Iterative inverse factor 4 
persons according to their response patter: l 
measured in a variety of ways, and both sample siz 
studied ean be inereased greatly with relatively litt 

105 


106 PSYCHOMETRIKA 


Additional advantages may include the following: (1) The “factorial 
composition" of each individual person can be obtained—limited, of course, 
to the variety of behaviors assessed. (2) Persons can be clustered according 
to the pattern of their specific dichotomous responses without any initial 
assumptions about item relations. (3) Large numbers of persons and items 
can be studied. (4) The items can be mixtures of any kind of attributes and 
behaviors. Dichotomously grouped and quantitative data can be used 
simultaneously. (5) Specific items of behavior, the source or basis of “ 
clusters,” can be determined. 


Inverse iterative factor analysis proceeds as follows: For N dichotom- 
ously scored items (i.e., accept—reject): 


person 


1. Obtain the frequency 
each specific item. 

2. Order all items according to the frequen 
response was given. Divide this item distributio; 
upper and lower half. 

3. Key each item according to wi 
of the distribution of acceptance. 

4. Determine the frequency (X,) with wh 
accept to items in the upper half of the distri 

5. Determine the frequency (X) with wh 
accept to all N items. 

6. For the given subject, enter X, and X in T; 


with which all subjects responded accept to 


Cy with which the accept 
n into two equal parts, an 


hether it is in the upper or lower half’ 


ich a given subject responded 
bution. 


ich the same subject responded 


able 1 as shown, where: 

(a) X, is the number of times the 
the half of the items to w 
frequently. 

(b) X is the frequency with w 
to all items. 


given subject responded ace 


| 8 ept to 
hich all subjects responded 


accept, most 
hich the given subject responded accept 
(c) N is the total number of items, constant, for all subjects 


7. Obtain the totals N/2 for each row of Ta 
the total number of items N. Obtain the number o 
subject responded reject N — X by subt; 
cells of the four-fold table by subtraction, 

8. Obtain the tetrachorie correlatio 
all subjects' tendencies to respond acce 
data of Table 1, 

9. Order all subjects according to their res} 
tions obtained in step 8. Select those with the highest Correlations for pool 1 
using an arbitrary cut-off value, for example, the lowest correlation statistic- 
ally significant from zero at the 1 per cent level. 


10. Repeat steps 1 through 9 after eliminating subjects in pool 1 from 


ble 1, by dividing in half 
er of items to which the given 
raetion, Complete the remaining 


n between the 


given subject’s and 
pt to the 


same N items, using the’ 


pective tetrachorie correla- 


BERNARD M. ΒΑ55 107 


TABLE 1 


Table for Computing Tetrachoric Correlation Between 


a Given Subject and All Subjects 


A Given Subject's Responses 


Upper 50% of Items "Accepted" 


by All Subjects 


Lower 50% of Items "Rejected 


by All Subjects 


All Items 


the sample. X remains constant for a given subject. Note that N remains 
constant for all subjects during the iterations, while X, varies with each 
successive iteration for each individual. None of the items nor any of the 
responses are ever eliminated from consideration as the iteration proceeds. 


11. Continue iterations until the correlations between each person and 
each pool of persons have been obtained. This is the unrotated oblique 


faetor matrix. 
12. Rotate to simple structure. 


Adding subjects to the analysis merely serves to at | k 
arithmetically, whereas а geometrie inerease would be involved if traditional 
&pproaches were used. Traditionally, the clustering of, say 100 persons would 
require approximately four times the work of clustering 50 persons. Clustering 


1,000 persons would be unmanageable for most experiments, since dpi work 
would become 400 times as great аз persons. Moreover, 


for clustering 50 
assuming that IBM cards or scoring sheets are employed in the procedures 
ior to be scored entails relatively little 


outlined here, adding items of behav: 
Work. 


ugment the work load 


REFERENCES 


U] Wherry, R. J. and Gaylord, R. H. The concept of test and 
to faetor pattern. Psychometrika, 1943, 8, 247-209. 


[2] Wherry, R. J., Perloff, R. and Campbell, J. T. An D 
Gaylord ONTE factor analysis procedure. Psychometrika, [ө 
[3] Wherry, R. J. and Winer, B. J. A method for factoring large number 


metrika, 1951, 18, 161-179. 
Manuscript received 7/25/58 


Revised manuscript received 1/16/98 


item reliability in relation 


empirical verification of the Wherry- 


1951, 16, 67-74. 
f items. Psycho- 


PSYCHOMETRIKA—VOL. 22, No. 2 
JUNE, 1957 


THEORY OF LEARNING WITH CONSTANT, VARIABLE, OR 
CONTINGENT PROBABILITIES OF REINFORCEMENT* 


W. K. EsrES 
INDIANA UNIVERSITY 


The methods used in recent probabilistic learning models to generate 
mean curves of learning under random reinforcement, are extended to the 
general ease in which probability of reinforcement may vary in any specified 
manner as a function of trials and to cases in which probability of reinforce- 
ment on a given trial is contingent upon responses or outcomes of preceding 


trials. 


Our purpose is to develop а general model for mean curves of learning 


under random reinforcement in * determinate" situations. By “determinate” 
we signify the following restrictions. In these situations the subject is con- 
fronted with the same stimulating situation, e.g, ἃ ready signal, at the 
beginning of each trial. The subject responds with one of a specified set of 
alternative responses, (А. › Ay у «δε, A), and following his response is 


presented with one of à specified set of reinforcing events, (Bi, B, 9*5 E, 
ponding to each possible response 


exactly one reinforcing event E; corres 
A, . Ina T-maze experiment (with correction procedure), Δι and А, correspond 
to left and right turns; E, and E; correspond to “food obtained on left" 
and “food obtained on right"; respectively. In a simple prediction experiment 
with human subjects [3, 8, 9, 10, 11, 13], the responses (Ai, Asso A) 
correspond to the subjects predictions as to which of E set of “reinforcing 
lights" (E; , Bay y E,) will appear on each trial; instructions are such 
that the subject interprets the appearance of E; to mean that response Α; 
was correct. It is further assumed that one can specify in advance of any 
trial the probability that any given response will be followed by any given 
reinforcing event. 

From the set- 
tion of association of contigui 
quantitative law describing 
on any trial: 

If E; occurs on trial n 
(la) Dii 


? 


stes and Burke [4, 6] plus an assump- 
ble (see [1, 8]) to derive the following 
probability of response A; 


theoretical model οἱ E 
ty, itis possi 
the change in the 


= (1 == 0)pi.n + 0. 


i i vasi idence + Center for Advanced 
*This paper was рг ared while the writer was in residence at the ‘for Ad ° 

Study mins por wi nl Sciences, Stanford, California. . The zo toh on шеп it i based 

was supported by & faculty research grant from the Social Science Researe ouncil. 

113 


PSYCHOMETRIKA 
114 


If E, (k = j) occurs on trial n 
(1b) Pisa = (0 = 0)р;,„ . 


The quantity p;,, represents the probability of response A; on trial 2, and 


9 X 1. The parameter 0 may 
and for a given organism from 


ith which each of the events 
ing experiment, then, given the initial 
rely mathematical problem to deduce 
rial and thus to generate a predieted 
ith experimental curves, For two 
has already been solved and the 
puted and fitted to data [1, 2, 8, 


all the simple non- 
i , hereafter designated Tj 


case, the probability of E; on any 


by the subject. Thus if the subject 
makes response A, , the probability of E; is 7; ;if the subject makes response 
is πο; ; and so on; but the values of т, remain 


of trials. Now we Wish to obtain a more general 
‘solution which will yield predicted ο 


"ments in which the 
"mitted to vary over 


General Solution and Asym 

Let 7;,, represent the probability that 

on trial n, with У), Tj = 1 for 
of making response A; on trial 
the probability* on trial n +1 


E; will occur 
ἃ subjects probability 


expected, or mean, value of 
must be 


"Throughout the paper, the quantity p; should be interpreted as follows. (a) In 
equations dealing with learning on a particular trial, e.g., (la) and (1b), Dj ngi represents 
the new probability on trial n + 1 for a subject who had the value Pin On trial п. (b) In 
equations dealing with the expected change on a trial, 5.6., (2), (2а), P; ni represents the 
expected value of p; on trial η + 1, where the average is tak $ 
Pj.» and all possible outcomes of trial n; t] 


taken over al 
i he term “all possible" is de 
situation by the initial values of р; and the possible 8 


η Д 101 Sponses and reinforcing 

events over the first n trials. (c) In solutions giving р; as а functi x 

;n is the expected value p; on trial n, where the average is tak 

265: and all possible sequences of responses and ге 
j 8 

trials. 


w ας -- 


W. K. ESTES 115 


(2) Pisa = (1 — Opis + Orin + 
obtain (2) average the right hand sides of (1a) and (1b), weighting them 
y the probabilities z;,, and [1 — 7;,,], respectively, that E; will and will 
not occur. 
. Some general asymptotic properties of the model can be clearly displayed 
if ме consider, not simply p;,, , the probability of a response оп ἃ particular 
trial, but the expected proportion of response oceurrences over à series of 
trials. The latter quantity, which we shall designate 7; (т), must of course 


satisfy the relation 
E 1 τ 
Bim) = = Xp 
pel 


Substituting into the right side of this expression from (2), we obtain 


aon + Ya — Dp + trl) 


Tia + (n — Όα — OP — 1) + An — 1)7;(n — Ὁ] 


πω 


portion of Ë; reinforeing events 


where 2; (n — 1) represents the expected pro 
ht side of the last expression 


over the first n — 1 trials. For large n, the rig 
approaches the limit 

Q= op:n = D + ozin — Ὁ. 
ways differs from p (n) by a term of the order of 


Further, since p,(n = 1) al y t 
ently large n, the approximate equality 


1/n, we can write, for suffici 
p(n) £z (1 — op:n) + οπ (n — 1), 


or 

p(n) = πα = 1). 
varies over a series of trials, the cumulative 
tend to equality as n becomes large. 
general “matching law” will play 8 


Thus we find that no matter how т; 
proportions of A; and E; occurrences 
It ean be expected that this remarkably 


central role in empirical tests of the theory. | | 
To study the pre-asymptotie course of learning, We proceed as follows. 


Suppose that a subject begins an experiment with the probability рг of 
making an A; ; then his expected probability on trial 2 will be, applying (2), 


Pia 7 qe 0)pi. + brini 
on trial 3, 
Dis = d = 0)p;.s + Oria; 


= (1 = ња + oQ — θ]π,.ι + OTi; 


116 PSYCHOMETRIKA 


and, in general, on trial n 
G) Pin = 0 — Op; + Ol — 0y755,, + (1 — OP ος esa 
TOL — Ὁ) πεῖ 


п-1 
= (1 — Ap + 0 2, {-- as, 


À number of important features that will characterize the m 


ean learning 
curve regardless of the nature of the function πι; can be ascertained by 


Dii = Din + Ө(т;.„ EZ Pin), 


we see that on the average, response probability 
direction of the current value of r; . As n becomes large, the term 
(1 — фр, in (3) tends to zero. After ^ is large enough so that 
(1 — 80)», , is negligible, p;,, is essentially a weighted mean of the т; 
values which obtained on preceding trials, with 7;,.-i having most weight, 


е orderly function of n, as for 
n, then the curve for Pin tends 
; but always “follows it With a Јар.” 
, ie. 0 is equal to one then p;., is si 
to 7;,,-, throughout the series of trials; the more 0 deviates from one, the 
more the curve for Pi.n lags behind that for Tja. 

We may gain further insight into this learning process and at the same 
time develop functions that will be useful in experimental applications by 
considering some Special cases in which т. 


in Сап be represented by familiar 
funetions with simple properties, 


on any trial changes in the 


Non-Contingent Case 
a. The special case of т;,„ constant 


If т; is constant, then as one might expect, (2) and (3) reduce to the 
simple expressions 


(2a) Pig = (1 — бур, 3: Or; , 
and 
(За) Din = 0; — (т; „ЖЫ ИЧ — OE, 


derived by Estes and Straughan [8] from the set- 


theoretical model [4, 6] 
and, with slightly different notation, by Bush and 


Mosteller [2] from their 


W. K. ESTES 117 


ΠΗ .. + 

к operator model. In this case the predicted learning curve is given 
y a negatively accelerated function tending to 7; asymptotically. Experi- 
mental applications of (3a) are described in references [2, 3, 5, 6, 13]. 


m š š 
b. The special case of πι, linear 


We shall treat this case in some detail since it has a number of properties 


that will be especially convenient for experimental tests of the theory. The 


linear function 

Tin = a; + bin, 
a; and b, being constants, is not in general bounded between zero and one 
for all n; for experimental purposes, however, one need only choose values 
of a; and b; which, for the number of trials to be given, keep the value of 
7;,, Within the required range. Subject to this restriction, we may substitute 
into (2) and (3) to obtain the expected response probability on any trial, 


(2b) piana = (1 = Opis + #@ + bin), 
and 


(3b) Din = a; + bin — ^ = e + b, = b 2 piaja — 0)". 

e have omitted the detailed steps involved in 
summing the series in (3); the method of performing the summation in this 
case, and in others to be considered in following sections, is given in standard 
sources [12, 14]. The reader can verify that (3b) is the correct solution to 
(2b) by substituting the former into the latter. The main properties of (3b) 
are illustrated in Fig. 1. Regardless of the initial value p;.1 » after а sufficiently 
large number of trials the curve for Ps.» approaches à straight line, 


b; 
Din = G; — 7? + bin, 


Ji 


In the interest of brevity W 


ight line representing 7;,,. Jf the initial 
d the slope of 7;.» is positive, Di,» will 
following which it will increase; 
mately at the minimum 
n by a continuous 


1 equal to zero and 


which has the same slope as the stra 
value of p;,, is greater than т; an 
decrease until its curve crosses the line т;»› : 
if b; is small, the point of crossing will be approxi 
value of p;., . To prove the last statement, we replace 
variable £, then set the derivative of p;.: with respect to 
find that p;,, has as its minimum value 


b; 
b; ыр ек. сМ 
pias. = 0; 9 + bıln — Tog (1 — 8)’ 


where κ 
log b; = log log (1 — 0 
bm = log (1 — 9) 


118 PSYCHOMETRIKA 


PROBABILITY 


6ο 
TRIAL 


FIGURE 1 


ponse probabilit; 
The parameter 


80 100 120 


Curves describing changes in res y when probability of reinforcement 
varies linearly with trials. 


9 has been taken equal to .05. 


and 
— (l = 9b m. 


ja 


К = 


01-0 


Subtracting z; ,,, from the minimum value of Di 


; We find that the difference 
is equal to 


which is negative and does not exceed b; in absolute value for any value of 0. 
To obtain an expression for R, (n), the cumulative number of A; responses 
expected in n trials, we need only sum (3b): 


(4 Rn) = 2». 


b; ` bin(n + 1) - 
Ë 9 | x D 


= Ë + b; -- b = Pia | Bei — a = or, 


W. K. ESTES 119 


шш һу summing (3b) over the mth block of k trials and dividing by 
, we obtain the expected proportion of A; responses in the block: 


p (k, m) = a; — H + E (2mk — k + 1) 

(5) " 

D + Gp = * = Pia 
ke 

ersome appearance, has essentially the same 


dily be fitted to experimental data. For a 
of n large enough so that (1 — μας. 


] B= ü= 6a je, 


Equation (5), despite its cumb 
properties as (3b) and can rea 
block of k trials beginning with a value 
is near zero, we have the approximation 
: b; b; 
pk, πὴ = a - το (2mk — k + 1). 
of p,(k, m) from a set of experimental 
timate of this parameter which, although 
y experimental purposes. 


By substituting the observed value 
data and solving for 6, we obtain an es 
not unbiased, will be adequate for man 


с. The special case Ti, = i + cb; 

ossible monotone relation 
is that in which т; 
d by the function Tj ,n 
being so restricted that Ti,» 


s between r; and n, the second 
approaches an asymptote. This 
= а; + c;b; , the values of the 

is properly bounded 


Among the p 
main type of interest 
type will be represente 
constants a; , b; , and c; 
between zero and one for all n. 


Equations (2) and (3) now take the forms 


(20) pas s бтн Ae Eas 
and, if b; = 1 — 6, 
6 6c;b; E 
Di = 0i DT == (a + КТЕ? = 797 — 6/7; 


or, if b; = 1 — 6, 

(80 pi. = ai + e 0(n — pa = 97 
Some properties of (3c) are illustrated in Fig. 
been taken equal to .50, c; to 1.0, and b; to .98 so that т; describes a 
negatively accelerated decreasing Curve approaching .50 asymptotically. 
The effect of changing the sign of b, from positive to negative can be seen 
by comparing the lower panel of Fig. 2, which has b; = —-98, a; = 50, and 
c; = 1.0, with the upper panel. Now the values of 7; oscillate from trial to 
trial between a pair of curves, the u ing identical with the 


pper envelope be 
т; „ curve in the upper panel and the lower enve 


= (0: Ρα == 9r. 


2. In the upper panel, a; has 


lope curve the mirror image of 


190 PSYCHOMETRIKA 


Oi» b ἡ Ὁ о 5 


o 
ο 
ο 
+ 
o 


50 60 


PROBABILITY 
b ü p Ὁ 6 N wm © 


1ο 20 30 
TRIAL 
FIGURE 2 


Curves describing changes in response probability when 
varies exponentially with trials. The parameter ϐ has | 


40 50 60 


Deen taken equal to 5. 


it. The values of Pin describe a damped oscil 
function; for any given set ο 


alternately above and below 


Exi 
Pin = αι — (a, θες аа 


with the deviation from the smooth curye decreasing progressively in magni- 
tude toward zero as n increases, 
A formula for the expected number of 


A; responses in n tr 
obtained and utilized for estimation of 6 as ir 


rials can be 
1 case (c). 


d. A periodic case 


From an analysis of the genera] so 
predict that if =; varies in aecordance w 
totically the curve for Pi.n Will be desc 


lution in Section (α) above, w 
ith a periodic function, then a 
ribed by a periodie function 


e can 
symp- 
having 


probability of reinforcement 


W. K. ESTES 121 


the same period. A simple case with convenient properties for experimental 
purposes is the following: 7; is constant within any one block of k trials, but 
alternates between two values, say a; + b; and a; — b; , on successive blocks 
so that the value of z; on each trial of the mth block is given by 


πι = a; + b(-1)". 


The value of p, at the end of the mth block can be taken directly from section 


(a) above: 

Dion = αι + b(— D" = la; + b(-D"-— Pi. m-ra (1 - ey. 
nits, this expression may be viewed as a 
form as (2). Substituting a; + b,(—1)”, 
ng terms Tj,n , (1 — 6), and n of (2) and 


Treating blocks of k trials as ur 
difference equation of the same 
(1 — 6)", and mk for the correspondir 
(3), we obtain the solution 

k 
(3d | _ ip” 1-00-01] 
) Bike = αι + bi( 1) ΠΕ a- 6)'] 


[1 == (1 = 0] e 
Дор = 

= {a + b; П + (1 πες o] pia (1 6) . 
e expected value of 2; at the end of the mth trial 


Equation (3d) gives us th 
we have for the expected value of 


block. Using (3a) of section (a) again, 
р; on the n th trial of the (т + 1)st block 

@ aa = m F А0" medo or. 
Properties of this solution are illustrated in Fig. 3. It can be seen that regard- 
less of its initial value, p; settles down to a periodic function with period К. 


πι" 5η .25 (-1γ 


PROBABILITY 


6ο 
TRIAL 


FIGURE 3 


Curves describing changes in response probability 
varies periodieally with trials. The parameter 


when probability of reinforcement 
0 has been taken equal to .05. 


122 PSYCHOMETRIKA 


€. Outcome contingencies 


Many cases in which the 
trial depends on the outcome (reinforcing event) of some 


; for example, that we 
E, occurred on the vth 
preceding trial. Then 

, the expected prob- 


set the probability of E, on any trial equal to т if an 
preceding trial and to ποι if an E, occurred on the uth 
we can write the following difference equation for πι 
ability of E, on trial n, 


Tin = т-ти + α ы Жїнї 


= (πιι = 72)71 nae T παι ; 


which has the general solution 


Tn = πι + Cr} + Curt Tee T Cr . 


The C; are constants to be evalua 


ted from the initial con 
periment; the r; are roots of the ch 


ditions of the ex- 
aracteristic equation 


r= ти + Ta = 0; 
and z, , the asymptotic value of πι, ; is given by 
T21 


πι Ξ š 
1 = Fa F arp, 


ify = 1, i.e., the probability of a 


given outcome depends on 
the preceding trial, the formula fo 


the outcome of 
T 71, reduces to 


Te πι == (πι — πι 4) m w 


n-1 


ποι) 


» has been deduced, 
dy developed for no 
orcement can be applie 


it may be substituted into (2), 
n-contingent cases With varying 


1 d to generate Predictions about 
the course of learning. In the case v = 1, the difference equation for p,., 


and its solution will be given by (2c) and (Зе), respectively, with a = πι 
and b = z, — тә ; this case has been discussed in some detail by Bush and 
Mosteller [2]. 

It should be emphasized that fune 
for outcome contingencies with > = 


tions derived from the present model 


va Бе a = 1 will generally provide satisfactory 
descriptions of empirical relationships only if the experiments are conducted 


with well-spaced trials, According to this model, the asymptotic conditional 
probabilities of A, on trials following E, and E, occurrences, respectively, 
are given by 


Pu = т + 6(1 — πι) 
and 


Da = ту = Вт < 


W. K. ESTES 123 


When trials are adequately spaced, these relations may prove to be empirically 
confirmable, but if intertrial intervals are small enough so that the subject 
can form a discrimination based on the differential stimulus after-effects 
of E, and E, trials, then the asymptotic conditional probabilities will certainly 
approach πι and rs, . A model for the massed-trial case can be derived from 
a set-theoretical model for discrimination learning [1, 7]. Although a detailed 
presentation of the discrimination model would be beyond the scope of this 
paper, it is interesting to note that the discrimination model yields the same 
asymptotic value for the over all mean value of p, as the present model, 
but yields asymptotic means for pii and pa which differ from mu and 7a , 
respectively, only by terms which are smaller than 6. 


Contingent Case 


Let т;;,„ represent the probability that reinforcing event E; will occur 
on trial n of a series given that the subject makes response A, on this trial, 
and assume that 95; ti = 1 for allt and n. Then to obtain the expected 
value of p;,,., as a function of the value on trial n, we again average the 
right-hand sides of (1a) and (1b), weighting each of the possible outcomes 
by its probability of occurrence, viz., 


(6) Pisa = (1 = рн + ? DX Pi atin + 


r the case of two response classes 

If there are only two response classes, Αι and As , with corresponding 
reinforcing events, Е, and E» , defined for a given situation, then we have for 
the expected probability of A, on the second trial of a series, 


Pis (= 8)р‹.л 
= (l = 0 + θπιι.ι 


а. General solution fo 


+ рата + — Pi.) a] 
-- θπ.ι.1)Ρι.ι + maay 


on the third trial 


pra = == θ)ρι.. 
+ as 721. Ἴ- θπει,», 


+ θ[ρι.επιι.α +A = Pi ποιο) 


= αιαιρι.ι 


when we have introduced the abbreviation 


L == 0 + бтп: T θπαι.ε ° 


a 
In general on the nth trial, - 
1721,4 
pis = Ῥιμαιας ci an-ı + бо: ‘`` Әһ-1 Σ "ERE 
(7) a 


ral 


= = Tau. 
= ра II. + 0 [le > [Таг 
fel 


124 PSYCHOMETRIKA 


Since each of the о, is a fraction between zero and one, we can see by 
inspection of (7) that р, „ becomes independent of its initial value, νι , aS 
πι becomes large; on later trials it is essentially equal to a weighted mean of 
the πρι values which obtained on preceding trials, with т» „—1 having most 
weight, 7r21.,-2 less weight, and so on. [If z = Land rs, = 0, then a = 1 
and (6) reduces to 

Dis = Din; 
i.e., on the average no learning occurs. In 
assume this case to be excluded.] 
ту. and 75,,, , the more comple 
mi; values of a few immediatel 


all derivations presented, we shall 
The smaller the average difference between 
tely is the value of p,., determined by the 


y preceding trials. As in the non-contingent 
case, the dependence of р, „ on the Sequence of πι; values, might be described 


as “tracking with a lag,” but in this instance it will be necessary to study 
some special cases in order to see just what is being "tracked," For 
convenience in exposition we shall limit ourselves to situations involving 
two response classes while describing the special cases. In a later section we 


shall indicate how all of the results can be extended to situations involving 
more than tivo response classes. 


b. The special case of πι; constant 


If 7,,,, and πο are both c 


onstant, then (6) and (7) reduce to the 
expressions 

(6a) Dion = (1 — Opia + 0 ωχ ] 

and 

(7a) dpi c S ποι 


2 Ta _ а= 
ق‎ = 797 — 0+ θπιι ES 67:1) T 
previously derived by Estes [5] from the Set-theoretica] model 


€ : 1911 and by Bush 
and Mosteller [2] from their “linear operator" model. Experimental appli- 
cations of (7a) are described in references [2, 5, 13]. 


с. Special cases leading to linear difference equations with constant coefficients 


will take the form of a line 
henever 731, 


Examination of (6) reveals that it 
equation with constant coefficients w 
constant. 'Thus, if 


ar difference 
‚апа πει, differ only by а 


Tin = а + Gn 
and 


Tain = Qa, + fs, 


W. K. ESTES 125 


where g, is any funetion that keeps z;;., properly bounded for the range of n 
under consideration, then (7) has the form 


x3 us a 
pis = pian s + θα” » a 
(7b) 1 2 
nol 
= Q21 ο п-1 Gu 
= d x == ϐ sE 
pr * ] — а + an a а unl Q 


where a = (1 — 0 + θαιι — даз). For experimental purposes, it will usually 
be most convenient to make g, a linear function of n, say g, = bn, in which 
case we ean perform the summation in (7b) and obtain a simple closed 
formula for pi, , Viz. 


ee аһ + bn . b : 
Bas Uu an + ал 0(1 — au + az)” 
(Те) 
4 (mth Nen Ай E т. pa) 
1 — an + as 0(1 — а + а) 


(1 — 0 + ба — @&) - 
The properties of (7e) are very similar to those of (3b), the corresponding 
solution for the non-contingent case. Regardless of the initial value pi. ; 


after a sufficiently large number of trials, the curve for Pi.» approaches the 


straight line 


E 


d, + m- b ^ 
Pin = 1 — 4n + da 6(1 = а + aa) 
neter in the latter expression, its value can be 
ht line to data obtained from a block of trials 
relatively late in the learning series. It becomes apparent now, μέν 
what it is that the pı,» curve “tracks with a lag." The first term on the rig - 
hand side of (7ο) is simply ma (L = T^ + Tarn): d Al geo d 
the slope of the pı,» curve is such that it would approach ποι/ 1 p. ide : | 
the asymptote of the constant πι; solution, (7a), if A s Tain гл 
remain eonstant from that moment on. Since the тя o not d y <n 
the subject’s curve tracks the “moving asymptote with a lag Š ү h d τ τ 
inversely on 0. As in the corresponding non-contingent «p t e . ae εν 
terminal linear portion of the p,,, curve can be predicted ın ^ we = 
experiment since it depends only on the values of dii › αι; and 0, 


assigned by the experimenter. 


Since 0 is the only free parat 
estimated by fitting the strag 


d. Contingent case with more than two response classes >: ! 
The results of the preceding section can be κ ας paie 
to situations involving more than two response classes. If ту; = Mi T θε 


126 PSYCHOMETRIKA 
for all 1 (1 = 1, 2, --- , r), then for a situation involving r response classes, 
we obtain by application of (6) the system of difference equations 
Tii = (1 — 0 + θαιὴγι,. + баур» „ Te + ña, ip, + bgn 
(8) Deni = бару + (1 = f ба»»)р» ., Tee + да, 2р, n + Ogn 
Prinst = 0Oa,,p,i., + 0a,p.. + --- +(1- 0 Æ 6a, „)р,.„ + 0g. , 


which must be solved simultaneously in order to obtain the desired formulas 
for p;,, . To facilitate the solution, we define an operator E as follows: 


Epi, = Dini < 
Then the system (8) can be rewritten in the form: 
(E — 1 + 0 — ба)р„— барь. — -+ — ña,p,., = Ogn 
— aP. + (E — 1 + 0 — 0a.)p.., — ++ 


— варі, — ба„р„— -e + (E — 1 + 0 — θα 


II 


ӘР,» = Ogn « 
Now the symbol E may be treated as a number while w 
System of equations by standard methods. The solutio 


the р; „аз a polynomial in powers of E. Then to obtain 


€ proceed to solve the 


If the form of the function gn is such that a 
asymptotic value, πι; , as n increases 
À; , of the р, „ can be obtained by solving simultaneously 


5 + g, approaches an 


= (1 = πιι)λι + Tu, H ».. T Tad, = 0 
πιολι = (1 же που)λο + ss + πολ = 0 


This system of equations has two properties of Special interest, First, the 
asymptotie response probabilities ^; are completely determined by the 
parameters πι; . Second, the mean asymptotic probabilities of the reinforcing 
events are determined by the same System of equations, If we let т, represent 
the mean asymptotic probability of Е, , then clearly 


i 


жу = Am; + Nera; рве + Хт,; . 


 .. .‏ . س 


W. K. ESTES 191 


But inspecti it i 
inspecting the jth row in the equation system above, we see that 
A; = Мт; + Yate; +: + Хт - 


Therefor ‚= 1 5 1 

uie "sl | n Ay 41.8. asymptotically the mean probability of a response 

ον te Ὁ “se mean probability of the corresponding reinforcing event. 

i богу, her example of the “probability matching" which has frequently 

nese ed in studies of probability learning with simple, non-contingent 

nd сш [3, 5, 8, 9, 13]. In the contingent case, there are no fixed en- 

Е al . πηρποκν to be matched by the subject, but the matching 

| rty again obtains when the stimulus-respons rv 

of statistical equilibrium. μμ 
In the special case when g, = 0 for all n and а;; 

Di Will be given by an expression of the form 


(9) Din = N Cat + бай bs b Cata 


= πι, the value of 


Where the absolute value of each of the x; is in the range 0 € х; š 1, and the 
С, are constants whose values depend on the initial p; values and on the Tı; - 
It may be noted that all of the C; need not have the same sign, and conse- 
quently the curve of р; will not always be a monotone function of n. Some 
of the curve forms which arise are illustrated in Fig. 4; the curves in the upper 
and lower panels represent the same value of 6 but different combinations 


of mi; , viz., 


Upper panel Lower panel 
E Е E E Е, Е, 
Ay 33 .33 -33 .33  .33  .33 
А, .50 .80 .00 .50  .50  .00 
Аз ar .00 .83 {88 200 .17 


It will be apparent from inspection of Fig. 4 that in this case, unlike the 
only the asymptotes of the learning curves but also 


non-contingent case, not 
the relative rates at which the curves approach their asymptotes depend 


upon the probabilities of reinforcement. 


e. Contingency with a lag 

The contingent cases discussed above cover the common types of ex- 
periments in which the probabilities of such reinforcing events as rewards 
or knowledge of results on any trial depend on the subject’s response on 
that trial. Now we wish to extend the theory to include the more remote 
contingencies which arise in games or similar two-person situations. In this 
type of situation it is a common strategy to make one’s choice of moves, or 
plays, on a given trial depend upon the choices made by one’s adversary on 
preceding trials. Regarding the first player as the experimenter and the 


128 PSYCHOMETRIKA 
550 


500. 


400 
350 
Зоо 
250 


200. 


800 900 1000 


RESPONSE PROBABILITY 


400 


350 


«500. 


«250. 


200. 


tooo 


TRIAL 


FIGURE 4 


robability under simple, contingent reinforcement. 
Tesponses Αι ane А» are the same under Schedules 
iffer. The parameter 0 has been taken equal to .015. 


Curves describing changes in response p 
Probabilities of reinforcement following 
Тапа П, but probabilities following A; d 


second as the subject, we can represent this kind of strategy in the ης 
model by letting the probability of reinforcement; of a given response on 
trial n depend upon the Subjects response on some preceding tria], say 
n — v. By the same reasoning used in the case of (2) and (6), we can write a 


esponse A; on any given trial: 


(10) Pins = (0 — Opa + ө У Pinner 


iin y 


W. K. ESTES 129 


of reinforeement of А; on trial n + 1 
(10) is difficult to handle unless the 


where z), represents the probability 


given that А; occurred on trial n — v. 


functions r, differ only by constant terms [Le, vu = а. + 9.00), 
ты = ал + g,(v), ete.]; for this case, (10) reduces to a linear difference 
equation with constant coefficients 

(10a) pisa = (1 — Opis + 0 22 Pines + @°% 


In order to exhibit some of the most readily 
eriments involving remote con- 
f two response classes and zD, 
the r, can be treated 


which can be solved explicitly. 
testable implications of this model for exp 
tingencies, let us consider the special case o 
independent of n. Then for a given contingency lag v, 
as constants, and (10) reduces to 

= (1 -- 0)... + θ[Ρι..-επιι +a- γι..-ε]ποι] 
= (1 — 0p.« + Om — Tapis. θποι · 


= 0) for which 
А, by 


(10b) Di sii 


the case (za = 1 and ποι 
the asymptotic probability of response 
= pı.» in (10b) and solving, viz., 


Now [excluding, as before, 
Pie = pa] we can obtain 
setting ры = Dis = Dı,n=e 
ον πα. o. 
Prai j — πι +T πει 
ptotie probability is independ- 
ution of (10b) is (ef. [12] for 
t of cases in which the 


We obtain the interesting prediction that asym. 
The complete sol 


ent of the contingency lag 0. 
thidetadled method uf derivation and for the treatmen 


characteristic roots are not all distinct) 

n a ον ον 
(10е) Din = бй + Caec CASE Tagg E 
evaluated from the initial conditions 


ts of the characteristic equation 
= 0. 
= 0), the characteristic 


< 1, and therefore x” 
then the characteristic 


where the С; are constants which can be 
the roo 


of the experiment and the 2; are 
т! = (1 = ox = θίπιι — Ta) 


Except for the degenerate case (ти = 1 and " 
roots will have absolute values in the range 0 su 
will tend to zero as n increases. If the lag > 15 Zero, 


equation is simply 
200—0 — Ou яа) = Ü 


which has the single root 
ж = 1 6+ θπιι — 87a, 


and (100) reduces to (7a) as it should. 


190 PSYCHOMETRIKA 


If the lag v is 1, i.e., probability of reinforcement on a given trial depends 
A ; а Меи σ 
on the response of the immediately preceding trial, the characteristic equa 
tion is 
а? -- а= θα — Om, — тл) = 0, 


which has the two roots 


[93 να — 6) + 4θίπιι — ποι) 
ES 2 


and 


Lo — VU = ΣΕ απ = m). 
X. = ° 


The properties of the solution will de 


pend on the relative magnitudes of 
πιι and ra, as follows: 


1. If zu = ra, , then z, and T are equal to 1 — бапа 0, 
respectively, and (100) reduces to the (За) of the simple non- 
contingent case. 

2. If = > ποι, then x, and т» are real numbers, positive 
and negative, respectively, with absolute values between 0 and 
1. Comparing the larger root, z, , with the characteristic root 
for the case of lag 0, we find that the difference between the 


former and the latter is always non-negative when πι > Ta; 
ie., 


1—0 + vil = 0)? of A07, = ποι) 
2 


¬ (1 — 6+ πι — ma) 
and the equality holds only in the degenerate cases (0 — 
ти = land ma = 0) for which (106) is inapplic 
can be predicted that when т > Ta, 
will approach its asymptote more sloy 
than for the case of lag 0. 
3. Ет, < ποι, 
T, and т» are real num 
quantity 


IV 


0, 
able. Thus it 
the mean learning curve 
vly for the case of lag 1 


then neither T, nor z, 


is negative. Both 
bersin the interval 0 < 


z < (1 — 0) if the 


Kt — 0° + 466, — 70] 
is positive; otherwise they are co 

in the interval 0 € | z | < 1. 
In general the estimation of parameters from data will be difficult when 
there is a contingency lag. Tests of this aspect of the theory can be achieved 
most conveniently by obtaining estimates of 9 from data obtained under 


mplex numbers with moduli 


W. K. ESTES 131 


onditions of simple non-contingent or contingent reinforcement and then 


ае predieted relationships for experiments run under similar con- 
ditions except for the introduction of contingency lags. Predictions about 
asymptotic probabilities are, of course, independent of ϐ and thus can be 


made in advance of any experiment. 


Interpretation of the M odel 


t developed here might be characterized as 


The theory of геіпѓогсетеп 
The concept of reinforcing event 


descriptive, rather than explanatory. 
represents an abstraction from a considerable body of experimental data on 
conditioning and simple motor and verbal learning. In a number of standard 
experimental situations used to study these elementary forms of learning, 
it is possible to identify experimentally defined events or operations whose 
effects upon response probability appear to satisfy the quantitative laws 
expressed by (la) and (Ib). The first task of our quantitative theory is 
simply to describe how learning should proceed under various experimental 
arrangements when these particular experimental operations are assigned 
the role of reinforcing events. A second task, which becomes important once 
the theory has survived preliminary tests, is to facilitate the identification 
of reinforcing operations in new empirical situations. We can test hypotheses 
concerning a class of events termed reinforcers only if we can state detailed 
testable consequences of class membership. To the extent that the model 
claborated here acquires standing as a descriptive theory, it will serve also 
to specify the quantitative properties which define membership in the class 
of reinforcers. Although a quantitative theory of this kind does not con- 


tribute immediately to an intensive definition, or interpretive account, of 
reinforcement, it does provide an additional research tool which may con- 
ting of explanatory theories. 


tribute to the construction and tes 


REFERENCES 

A component model for stimulus variables in dis- 
1957, 22, 133-145. 
dels for learning. New York: Wiley, 1955. 


oice behavior. J. exp. Psychol., 


1] Burke, Ο. J. and Estes, W. K. 
crimination learning. Psychometrika, 
[2] Bush, R. R. and Mosteller, F. Stochastic mo 
3] Detambel, M. H. A test of a model for multiple-ch 
ς 97- 

1955,40 Te arning. Psychol. Rev., 1950, 57, 94-107. 
situations. In R. M. Thrall, C. H. 


4] Estes, W. К. Towar τ 
5] Estes, W. K. Individual behavior in uncertain 
S КЕТА 2ds.), Decision processes. New York: Wiley, 1954, pp- 


Coombs, and R. L. Davis (E 
127-137. 

6] Estes, W. K. and Burke, C. J. À theory of stimulus va 
Rev., 1953, 60, 276-286. 

[7] Estes, W. K. and Burke, 
nation learning in human subj 
8] Estes, W. K. and Straughan, 
terms of statistical learning th 


d a statistical theory of le 


riability in learning. Psychol. 


of a statistical model to simple discrimi- 
1955, 50, 81-88. 

verbal conditioning situation in 
1954, 47, 225-234. 


C. J. Application 
ects. J. ezp- Psychol., 


JH. Analysis of a 
eory. J. exp- Psychol., 


192 PSYCHOMETRIKA 


[9] Grant, D. A., Hake, H. W., and Hornseth, 1. P. Acquisition and extinction of a verbal 
conditioned response with differing percentages of reinforcement. J. exp. Psychol., 
1951, 42, 1-5. 

[10] Hake, H. W. and Hyman, R. Perception of the statistical structure of a random 
series of binary symbols. J. exp. Psychol., 1953, 45, 64-74. 

[11] Humphreys, L. G. Acquisition and extinction of verbal expectations in a situation 
analogous to conditioning. J. ezp. Psychol., 1939, 25, 294-301. 

[12] Jordan, C. Calculus of finite differences. New York: Chelsea, 1950. 

[13] Neimark, E. D. Effects of type of non-reinforcement and number of alternative 
responses in two verbal conditioning situations, J. exp. Psychol., 1956, 52, 209-220. 


[14] Richardson, C. H. An introduction to the calculus of finite differences. New York: 
Van Nostrand, 1954. 


Manuscript received 8/27/56 


PSYCHOMETRIKA—VOL. 22, ΝΟ. 2 
JUNE, 1057 


A COMPONENT MODEL FOR STIMULUS VARIABLES 
IN DISCRIMINATION LEARNING* 


C. J. Burke Ахр W. К. EsTES 
INDIANA UNIVERSITY 


derived describing the conditioning of a single 
stimulus component in a di о Тыз function, together 
i istical learning theory [5, 12], generates 


with the combina ^ ° T 
empirically testable formulas for learning of classical two-alternative dis- 
eriminations, probabilistic discriminations, and discriminations based on 


the outcomes of preceding trials in partial 


del developed by Estes and Burke 
£ reinforcement [4], it is possible 
crimination learning. This model 
bolition of conditional relations 
d the independently variable 


From the set-theoretical stimulus mo 
[5], together with à descriptive theory © 
to derive a model for certain aspects of dis 
is assumed to describe the formation and а 
("connections") between response classes an 
components or aspects of discriminative stimuli. It does not take account of 
patterning effects, *tobserving responses" [18], adaptation of irrelevant. cues 
[11], or many other complications, and thus is not expected to provide à 
generally adequate account of discrimination learning. The model does, 
however, describe the data of certain especially simplified experiments 
[1, 6, 7, 12]; these findings hav idence for the assumption 


e been taken as ev 
that it may eventually form a part of a more comprehensive and adequate 
theory. | | | | | 
Since our primary concern in this paper 18 with stimulus variables, 
lest possible conditions relative to other types of 


we shall assume the simp ; í 
rived, the reference class of experiments will 


variables. For all functions de h em : ien 
involve two stimulating situations which are presentec ina random order, 
i ding reinforcing events, and 


two alternative response classes with correspon 
determinate, non-contingent reinforcement. T 


Definitions and Assumptions 


We shall require the following terms: 


Ay and Аз Mutually exclusive and exhau 
Bı and E; Reinforcing events for Ai and Аз 
а were facilitated by a grant from the 


stive response classes. 
‚ respectively. 


N *The researches on which this paper js base 
ational Science Кошан i t" we mean that exactly one pre- 
А rmi "contingent reinforcemen we mea at exactly one p 
ко “determinate, curs δα each trial and that the probabilities of reinforcing 
8 , 
events are not conditional upon the subjects responses. 
133 


194 PSYCHOMETRIKA 


T; and Т» Types of trials (corresponding to the two stimulating situations 
which are to be discriminated). 
Taj (with; ть; =1) Probability of E; on trials of type T, . 


Band 1 --β Relative frequencies of 7; and Т» trials during a learning series. 

Sı and Sa Sets of stimulus elements available for sampling only on T, and 
only on Т» trials, respectively. 

S. Set of stimulus elements available for sampling on all trials (i.e., 
elements common to the two stimulating situations). 

S Set of all elements associated with either of the two stimulating 


situations; i.e., 
S-SUSUS, 
where U indicates set-theoretical summation. 
Νι Number of elements in S;. 


Oi and 0:2 Sampling probability of the ith element of S on trials of type Tı 
and Т», respectively. 

Fin Probability that the ith element of S is “connected to" response 
A; on trial n (i.e., at the beginning of trial n). 

F; Limiting value of F;,, for large n. 

Phi,n Probability of A; , in the stimulating situation corresponding to 
Th, on trial n. 

Pn Probability of a “correct response" on trial n, where a correct 
response is defined as A, on Ti trials and Az on 7's trials so that 

(1) p, = βρει. + (1 — Вр... 


Our assumptions about lea 


ments of learning in elementary, non-diseriminative situations [4, 


Specifically, we assume: 


ning are taken directly from previous treat- 


5, 8]. 


a. At any time, each stimulus element in S is * 
one of the responses A, or A, . 


b. The F, , remain fixed in value except when reinforcing e 


'connected to" exactly 


vents occur. 
c. If E, occurs on trial n, then 


(2) Fin = (1 = Gin) Pint + Oi, < 


d. If E; occurs on trial n, then 
(3) Г. ( -- OnE sar . 


In (2) and (3), h = 1 or = 2 according as the trial is ο 
F,,, are brought into relation to response probabilities by the assumption 
that probability of a given response on any trial is equal to the proportion 
of stimulus elements in the trial sample that are “connected to” the response. 
Intuitively, a “connection” is obviously intended to represent a learned 
association, but its formal properties are limited to those expressed above. 


f type T, or T, . The 


Learning Functions for a Single Stimulus Element 
General case 


Using the terms and assumptions given above, we can write a general 


C. J. BURKE AND W. K. ESTES 135 


difference equation expressing Fin , the probability that the ith element 
of S is connected to response А, on trial л, as a linear funetion of the prob- 
ability on the preceding trial. 


Fin = BIL = OP int + бат] + (1 — BQ — 6): + biana] 


(4) 
= (I = Btn biz + B0;3)F i.a + B0; + (1 = B) OiT - 

If the quantity /’,,,-1 on the right side of (4) represents a probability associated 
with a particular subject on trial n — 1, then clearly the quantity P;,, on 
the left represents the expected value on trial n, where the expectation is 
taken over the possible stimulus sampling outcomes and reinforcing outcomes 
of trial n — 1. If an experiment were repeated many times (or, equivalently 

th like parameter values were run simultaneously 


if a population of subjects wi 
through an experiment) there would result à distribution of values of σας 
the eonditional mean 


and therefore also of F;,, . For each value of Fi.n-1 » 
value of P, , is given by (4). Therefore the relation between the mean values 
ЁР, „лапа PF... is also given by (4). We shall now suppose that this averaging 
and in the remainder of this paper we shall interpret 
| following equations as expected values for the 
population of all repetitions of a given experiment (or, equivalently, for all 
subjects having а given set of parameter values). With this interpretation, 
and with the restriction that the parameters Oi and mj are constant Over 
trials, (4) can be solved. The resulting expression for Pa, , readily verified 


by induction, is 


(5) Fin = LI a Ἠ 


has been carried out, 


Р and P^, in (4) and al 


m = вва — ü — 86:1"; 


where 


; αμα Одот, 

шщ F, = 88, t (l 0а 

The quantity [1 = 803 — a- β)θω] is pounded in the interval 0 to 1, so 
y = il 12. м 

Fin describes a negatively accelerated curve with F, as the asymptote. It 


will be apparent from inspection of (0) that Г, is ' am .. oo 
probability of ЁЛ on trials when the ¿th element of 5 18 sampled; 1.©., 


(7) z, = Poh |X D 
at takes on the values 1 and 0 according 


where X, is a random variable th 
y trial. 


as clement i is or is not sampled on an 


Special cases . detti 
classical discrimination 


le acquisition, ὶ 
ing are now obtainable as 


Learni ions for simp 
earning functio jearnin 


learning, and probabilistic discrimination 
Special cases of (5). 


196 PSYCHOMETRIKA 


a. A discriminative situation reduces to simple acquisition if only one 
stimulating situation is ever presented. When we express this restriction by 
setting 8 = 1 and dropping the h subscripts in our general discrimination 
function, (5) reduces to 


(8) Fi и ү = (πι = ЕЁ; JQ = ολα 


b. In a classieal diserimination problem, the two stimuli to be dis- 
eriminated (e.g., black card vs. white card, bright light vs. dim light, circle 
vs. triangle) together with background, or contextual stimulation, are 
represented by two stimulus sets. On T, trials, the set S, VJ 5. is available 
for sampling, and on T, trials the set S, U S. is available. It will often be 
assumed that 0 values are equal for all elements within each of the three 
subsets S, , S, , and S, . With this assumption, (5) becomes for δι: 


(9) Fin = ту -- (πει — FQ) — 80,7; 
for S, : 

(10) Ρο. δαν c (ποι — F,.)[1 = (1 = 8)02]; 
and for S, : | 

(11) Pom = m, = (πε — FL) — 0)", 
where 


πε = Bru + (1 — Bra. 


c. Traditionally, discrimination learning has been regarded as a matter 
of distinguishing two (or more) stimulating Situations which differ with 
respect to certain component stimuli or stimulus attributes; and « 
nation theories have been limited to this paradigm. More generally, however 
there is a basis for discrimination learning whenever some of the components 
or attributes of a situation bear non-random relationships to reinforcing 
events. If two "situations" to be discriminated include the same stimulus 
components, and differ only with respect to the Sampling probabilities of 
the components, we shall speak of probabilistic diserimination learning. 
Just as partial reinforeement Tepresents a natural Beneralization of simple 
acquisition and extinction procedures, probabilistic discrimination learning 
represents a natural generalization of the conventional discrimination 
paradigm. 

In the terms defined above, the condition for probabilistic discrimi- 
nation learning relative to the two "situations" Sampled on trials of types 
T, and T, is simply that there be some difference between the distributions 
of 0;, and 0,, . A simple arrangement which has proved convenient for ex- 
perimental tests of the theory [cf. 7] is the following. The reinforeement 
probabilities mıı and т» are set equal to unity and Zero, respectively; i.e., 


diserimi- 


C. J. BURKE AND W. K. ESTES 137 


response AL is always reinforced on T, trials and A, is always reinforced 
on T, trials. The stimulating situation includes V separably manipulable 
components. On T, trials, the sampling probabilities of the anipemmuta 
04 fori = 1,2, °°: , N, are given by the linear function | 


(12) θα = αι + byt; 


= on T, trials the sampling probabilities, 0;2 {ог = 1,9, == , N, are given 
Dy 


(13) θε = а + dat 
In this case, (5) becomes 


_ βίαι + bú) _ [ze Eb ғ. а -a- By^, 


(Qu Pin = = 
i a + bi a+ bi 
where 
а = Ba, + (1 — Ba 
and 
Е = Bb, + (1 — 80: 
Now if we set b, = =D» and 8 = 1/2, values which, in fact, were used in 
the asymptotic value of 


the experimental application referred to above, 


F,,, is a linear function of a: 
1 
Е; = 3 + bj). 


stical tests of the correspondence 
. A convenient estimator of 
ith stimulus component 


This case is obviously advantageous for stati 
between predicted and observed values of F; 
I’; is the proportion of A, responses evoked by the 
when presented alone. 

Terms of Response Probability 

it is a purely mathematical problem to 
Two principal cases arise: (a) 


Learning F unctions in 


Given the formulas for Fs 
derive expressions for response probability. ineipa 
that in which one wishes to predict response probability 1 the presence of 
a specified sample of stimulus elements (as might be the case inan experiment 
on stimulus compounding or transfer); and (b) that in which one wishes to 
predict response probability when stimulus elements are being sampled 
randomly from a specified population (as might be the case in experiments 
on acquisition and reversal of discriminations). Case (a) is the simpler, since 
expected response probability will be given directly by the arithmetic mean 


of the Fin, 1:65 


198 PSYCHOMETRIKA 


1 
(15) Pun = Z > Rusi 


where the summation is taken over the K elements in the given sample. 

In case (b) we shall follow the procedure used in our earlier treatment. 
of simple learning [5] and replace the arithmetic mean of the F; „ with the 
weighted mean: 


1 R 
(16) Dus = No, pa OnF in 


(the summation being taken over the set 5,). Equation (16) does not, in 
general, give the exact value for the mean proportion of A,-connected elements 
, 
which elements are 
is very small, (16) may not 
nce direct computation of ex- 


To illustrate the derivation of learning functions by means of (16), we 


pecial cases which have arisen in experimental 
applications of the model. 


Acquisition with random reinforcement 


If an experiment involves but a single stimulating situation, the acqui- 
sition function can be obtained by substituting (8) into (16), viz., 


(17) Pin = Mg Уб — (αι Р) — ογ-ὴ 


=m = ра Daa ө)", 


In the derivation of (17), and in all ensuing derivations, 

simplicity that the values of F,., and 6, are uncorrelated. The r 
consequence of (17) is the prediction that asympotically resp 
should approach the probability of reinforcement re 
tribution of 0; values. If all of the 0: are equal, (17) red 


We assume for 
most interesting 
onse probability 
gardless of the dis- 
uces to the function 


Pin = πι — [πι — Dall — 6)" 
used by Estes and Straughan [8], Neimark [ 10] and others to describe acqui- 
sition under non-contingent random reinforcement. 
Classical discrimination learning 


For discrimination experiments involving two stimulating situations 
which differ with respect to some of their components, learning functions 
can be obtained by substituting into (16) from (9) and (11) or from (10) and 


C. J. BURKE AND W. K. ESTES 139 


= b» hes former yielding probability of response A, on T, trials and the latter 
e yability of A, on T trials. Using the notation 8,.. for the mean value 
0 over all elements available for sampling on T, trials, i.e., 


[NES Νιθι + Neb 
bte N, + N. ; 
we obtain for probability of A, on Т, trials 
1 
Dua ως παν [N [n =a = p, y — ê 


Νιεεθι:εε 
+ Nr. = (z, = Pe — 60” ]). 


Letting 
ا‎ 
' Nye 
and 


We = Neb, А 
° Nue 
this expression simplifies to 
(18) pus = wma + Were 7 q (ma 7 pu — 80)" 
= wr. — δια = pm, 
als, we obtain 
at — (1 = 861 
= wr, = Pa = θε"... 


Similarly, for probability of 4, on T; tri 


n-l 


(19) Pap = umm j were — wma — P 


ination is seen to depend both 
ating situations and on the 
hes on discrimination 


of one situation and 
i.e., to let ти = 1 


‚ of discrim: 


The predicted asymptotie accuracy 
the two stimul 


on the amount of overlap between 
values of r, and ποι - It has been customary in researc’ 


learning to give uniform reinforcement in the presence 


uniform nonreinforeement in the presence of the other, 
and z, = 0. This restriction is clearly unessential, however; theoretically, 


better than chance discrimination will be possible whenever the values of 
ту and ra, differ, provided, of course, that We js less than unity. In a recent 
experiment conducted to test the theory, ти and ra, were set equal to unity 
and 0.5, respectively, and the empirical curves of pir» and Pain diverged 
in accordance with theoretical expectation [6]. 


Probabilistic discrimination learning 
discriminated d 


t stimuli, 


When the “situations” to be iffer only with respect to 
the discrimination curves 


the sampling probabilities of componen 


Ednl Rese 


' cn! 


140 PSYCHOMETRIKA 


for probability of A, on Т, and T, trials are given by (16) with h = 1 and 


h = 2, respectively: 
I 
(20) Dus = Na, Σ, OaE; 
and 
1 
(21) Pan = No. 2 S š 


Ρον being given by (5) in each case. In general, better than chance dis- 
crimination will be theoretically predicted whenever there is any difference 
in the distribution of @ values on Τι and T, trials. When reinforcement is 
uniform and the 0;, distributions are linear, as assumed in the derivation of 
(14), the asymptotic values of Pir.» and pa,,, are given by 


zx I βίαι + bii)? 
(22) pu. = Na, > a+ bi 
and 
1 (1 — β)ία, + bai)(a, + bii) 
(23) Dr. = < > = 
NO, 4 a+ bi š 
where 


8, = E Zz (a, + bi). 


In an experimental test of the theory [7], we have set, b, = —b, ‚а, = 0, and 
a, = (N + 1)b, . With these restrictions, 


δν = 6, = b,(N + 1)/2 
and (22) and (23) reduce to 


t°‏ .___ بے 
Dus = NN 0 — 0 — ΑΝ +1) + (28 — Di‏ )24( 
and‏ 
Diis = 21 = B) > (N +1 -- iji‏ )25( 


NW +1) 4^ (0 — BW 3-3) + (28 — iyi? 
and we have the curious result that 
independent of b, , the slope paramete 
case of 8 = 1/2, these asymptotic exp. 


asymptotic response probabilities are 
r of the 0 distributions. For the special 
ressions reduce further to 


= — 2 2 
Poe = NE 
2N +1 


|» SN FD?’ 


(26) 


ο. J. BURKE AND W. K. ESTES 141 


and 
2 " -T 
Pus = NUN + Di >= QW + 1 — 0: 
Е 9 w 
(27) Sors 


2 [ss +1? ΛΙΝ + DON + D) 
N(N + 17 2 6 


sped, 
3(N + 1) 

Asymptotie probability of correct responding in this case is seen to vary 
from 1/2 to 2/3 as N ranges from one to infinity. 
, One point of interpretation concerning our 
diserimination learning requires especial emphasis. 
probabilities 0,, may be associated either with the hy 
of an experimentally homogeneous stimulating situation or wi 
manipulable components of a situation. In the former case, 
functions should be applicable as they stand. In the latter case, 0, rep 


à product of two probabilities, 1.6. 
6, = 0,0. 


functions for probabilistic 
The stimulus sampling 
pothetical elements 
th independently 
the derived 
resents 


xperimenter-determined) sampling probability 
of the ith stimulus component. The parameter û is the (subject-determined) 
Probability that any element associated with the ith component will be 
sampled on trials when the ith component is present. In the experiment cited 
above [7] the N stimulus components Were signal lights with sampling 


Probabilities, 61, prescribed by the experimental design. Since all of the 
en erties, we assumed that the subsets 


Signal lights were similar in ph sical prop : 
of nbi de ل‎ nae with ie various individual lights all had the 
Same value of 0. When this assumption is satisfied, the asymptotic values of 
P. and Prin are independent of 0, and therefore are predietable in advance 
9Í an experiment. (6) becomes 
> βθθίιπιι + a E В) 001a 
f ggg, +O c ЮВ 


The parameter 6/, is the (е 


(28) 
Bim s (1 <= B) bizta : 
= UIN. 


a " A š з 
nd by substitution into (16) 


142 PSYCHOMETRIKA 


1 
= — ӨЕ, 
Duas No, 25 


1 = 
(29) Σπα i IP., 


TA 2. θίμρ,, 


where N, 15 the number of stim 
trials of type T, , and 0; 
sampling probabilities. The 


Discriminations based on traces of reinforcing stimuli 


Numerous experiments concerning learning with partial reinforcement 


have Suggested that when trials are sufficiently massed, the subject forms a 
discrimination based on the stimulus after-effects of reinforcement and non- 
reinforcement (see, e.g., [3], pp. 16-18). This type of discrimination learning 
should be expected to show up especially clearly if experimental arrange- 
ments prescribe a non-random relationship between probability of reinforce- 
ment on any given trial and the reinforcing outcome of the preceding trial; 
for an example of such an arrangement, (see [9]). In order to treat trace dis- 


crimination in terms of the present model, we shall assume that one set of 
stimulus elements, S, , is available for sampling 


event E, , and a second Set, S, , is available ο 
of E, . For simplicity we Shall limit our deriv. 
Which S, and 


: ¢ parameters πι, and Ta 
ken to represent probabilities of 7 


7; On trials following Æ, and 
E, Occurrences, respectively; 8 will be the average probability of J, trials, 
and must satisfy the relation 
8 = چ‎ = тт + (1 — ї)т»\ 
(30) ποι 
س کے‎ 2... 
1 = ти + ποι 


Substituting # for B in (9) and (10), and letting 0, = 0, = 0, we obtain 


(31) Fia = Tu — (т. — F,j)Q = 03)" 


ο. J. BURKE AND W. K. ESTES 143 


and 
(32) Fon = ma — (ma — Fal — 61 = 3r. 


From (31) and (32) we can compute expected probabilities of response 4, 
following E, and E, trials by taking account of the ineremental or decremental 
effect of the reinforcing event: 


= (1 — ёти), + θπιι 


(83) pus 
= ma + mall — m) — Gu — pa). — 6n). — 65 Bras 
bio = ü 2 + 073) Fo.n-1 
(34) = m= — 0(1 — rara — (та — ра.) — 0(1 = ποι)] 


4 - ea = 9], 


апа оп the average 


pis = Dus + (1 = ρει,» 


ποι)‏ — .ر ا 


1 
z — 0(1 -- πιποι οπών 
= (mi = pi.) =ч θπιι)(1 = 07) 
1 — 6(1 — Ta)l[l — 0(1 — z^. 


(35) 


n-2 


= = (т — Pal 
t the probabilities of A, on trials following 


The gis hese results is tha t ; 
E, ο... should tend asymptotically to theonnditional raped 
ities of E, and Ez, respectively, plus or minus correction Few E ν 
smaller in magnitude than 0; and the average probability of Ay s. ү 
asymptotically to the average probability of E, , plus or minus a © 

itude than 0. 


term" which is smaller in magni . | 
For purposes of experimental interpretation the functions derived here 


Should be expected to apply when trials are ا‎ та = 
stimuli associated with Ë, and E, sufficiently distinct so ü at: ως E 
of no overlap between Sı and S, is tenable. As trial η а x ὦ 
should expeet the communality of Sı and S to increase unt 


i Г ition, the 
i "seriminable. Under the latter eondit , 
thc Mesue ni m s inis stimulus population on all trials, and 


subject is in effect sampling а SI 
(33), (34), and (35) reduce to 


п-1 
(86) Bins = т + 4(1 -ᾱ) = (z == рп.) n 0) 
(87) Die. = т = 0F — (z — pawl -9᾽, 
and 


(38) Dya 1 - a= pi) a 0)". 


144 PSYCHOMETRIKA 


Cases of intermediate spacing should be expected to fall between these two 

extremes. Explicit expressions for cases involving partial overlap between 

8 and S, can be derived by obvious extensions of the method. 
1 


s illustrated 
here. 


The Role of Component Models in Discrimination Theory 


How can we characterize the empiric 
here as a theory of discrimination learn 
quantitative account of data in certain experiments conducted under con- 
ditions especially designed to satisfy the simplifying assumptions of the 
model and predicts some new phenomena, notably those of probabilistic 
discrimination learning. More generally, the model appears to give a 
reasonable account of the development. of differential S-R correlations 
under differential reinforcement, of the relation fo 
between asymptotic accuracy of discriminations a 
“similarity,” and of transfer phenomena observ: 
discriminative situation are tested in new combi 
ment of discrimination (see, e.g., [7, 12] 
account for the fact that in some situations subjects, anim, 
are able to achieve essentially perfect discriminations between 
have components or attributes in common 

Formally our stimulus model represents the type of component-s 
model which, with minor variations in detail, 
temporary approaches to discrimination theory, 
Mosteller [2], Restle [11], and Wyckoff [13] as we 
stimulus variables are concerned, our model 


detail than the others, but we have not attempted to handle effects of v 
or attentional factors 


5. The logical next. 
be to examine possible auxiliary hypotheses, for example those relating to 
"observing responses" or adaptation of 


how the present limited t, 
broader rang 


al adequacy of the model presented 
ing? At a minimum, it provides a 


und in some situations 
nd stimulus overlap or 
ed when the components of a 
nations following the develop- 


). The component mode] does not 


al or human, 
stimuli which 


ampling 


underlies numerous con- 


heory may be most effecti 
e of discrimination experiments, 


REFERENCES 

[1] Burke, C. J., Estes, W. K., and Hellyer, 
stimulus variability, J. exp. Psychol., 1954, 48, 153-161, 

[2] Bush, R. R. and Mosteller, F, A model for stimulus generalization and discrimination. 
Psychol. Rev., 1951, 58, 413-493. 

[3] Estes, W. K. Learning. In Annual review of psychology, 1956, 7, 1-38. 

[4] Estes, W. K. Theory of learning with constant, variable, or contingent probabilities 
of reinforcement. Psychometrika, 1957, 22, 113-132. 

[5] Estes, W. K. and Burke, C. 
Rev., 1953, 60, 276-286. 


; 8. Rate of verbal conditioning in relation to 


J. A theory of stimulus variability in learning. Psychol. 


C. J. BURKE AND W. K. ESTES 145 


Estes, W. K. and Burke, C. J. Application of a statistical model to simple discrimi- 


nation learning in human subjects. J. exp. Psychol., 1955, 50, 81-88. 

Estes, W. K., Burke, C. J., Atkinson, R. C., and Frankmann, J. P. Probabilistic 
discrimination learning. J. exp. Psychol., 1957, in press. 

Estes, W. К. and Straughan, J. H. Analysis of a verbal conditioning situation in 
terms of statistical learning theory. J. erp. Psychol., 1954, 47, 225-234. 

Hake, H. W. and Hyman, R. Perception of the statistical structure of a random 
y symbols. J. exp. Psychol., 1953, 45, 64-74. 

Neimark, Edith D. Effects of type of non-reinforcement and number of alternative 
responses in two verbal conditioning situations. J. exp. Psychol., 1956, 52, 209-220. 
Restle, F. A theory of discrimination learning. Psychol. Rev., 1955, 62, 11-19. 
Schocffler, M. 8. Probability of response to compounds of discriminated stimuli. 
J. exp. Psychol., 1954, 48, 323-329. 
Wyckoff, L. B., Jr. The role of obser 
Rev., 1952, 59, 431-442. 


series of binar 


ving responses in discrimination learning. Psychol. 


Manuscript received 8/27/56 


ISPPHOSETRUA- ус. 22, κο. 2 
JUNE, 1957 


NS BETWEEN THE 


SIMPLE PROOFS OF RELATIO 
LTIPLE CORRELATION 


COMMUNALITY PROBLEM AND MU 


Louis GUTTMAN 


THE ISRAEL INSTITUTE OF APPLIED SOCIAL RESEARCH 


AND THE 


HEBREW UNIVERSITY IN JERUSALEM 


Solutions of the communality problem and of the problem of meaning 
of common and unique factors have been shown previously to depen 
intimately on certain relations with ordinary multiple correlation. To 
make these basic p cessible, simple proofs of some of them 
are provided here, avoiding any ix algebra A 
obtained, with no extra work, that extend the previously known propositions 


to a more general class 0 
and any set of n observed variables, 
to hold. If p; is the multiple correl- 
en — 1 remaining ones, and if hj 


ation of individuals 
the following inequality is known always 
ation coefficient of the jth variable on th 
is a communality of the jth variable, then 
G T 1, 2, e n). 

eral different ways (ef. [1], рр. 


For any popul 


(1) pah 
ablished in sev 


Inequality (1) has been est 92-3; 
[2], p. 278; [3], p. 293; 170). κο 

One important use of the inequality isin conjunction with the Spearman- 
'Thurstone hypothesis that the given correlation matrix results from m 
common factors, where m is much smaller than n. An inequality equivalent 


to (1) is 

1 = hi = 
a L —— < 1 
τ 1-5 


an, over j, of the left member 


It has been shown in [6] that the arithmetic me 


of inequality (a) satisfies 
m 


n 


| 
Ὁ) n Σ 1 
that m/n = .10, 


ors in common, 50 
not less than .90. 


е1 = №401 — nis 
е Center for Advanced Study in 


z 5 fact 


50 tests have only 
to (b) the mean ratio 0 
n while on 


` 
For example, if 
then according 


ü *Revised from a paper writter 
10 Behavioral Sciences. 


leave at th 


147 


148 PSYCHOMETRIKA 


This implies that, on the average, A? is not more than .10 larger than .90pj . 
Thus, if р; = .50, h? is on the average not more than 55. | f 

As m/n gets smaller, the bound to the average degree of approximation 
of pj to ^? improves; and as m/n — 0 (which will happen if the battery of 
tests is enlarged and the Spearman-Thurstone hypothesis continues to hold), 
it must be that pj — h} for almost all j. Conversely, if trial communalities 
are computed which leave large discrepancies between the left 
members of (1), then either the trial communalities are 
Spearman-Thurstone hypothesis concernii 
rejected. 

Another use of inequality (1) is for studying the meaning and determinacy 
of factor scores (as distinct from factor loadings). Let δ} be the (nonnegative) 
difference between the left and right members of (1), 


and right 
erroneous or the 
ig the smallness of m/n must be 


(c) δ) = 18 ¬ р; G = 1,2, -e n), 


It has been shown in [3] that 6; is the variance of the difference between the 
jth unique factor scores and the errors of estimate of the jth observed 
variable from the multiple regression on the n > 1 remaining observed 
variables (i.e., the jth anti-image scores). Therefore, if p? is very close to 
hj, δ must be close to zero; the jth unique factor Scores must be essentially 
equal to the jth anti-image scores. This provides a unique meaning for and 
determination of "unique" factors in such cases. 

A third important context in which inequalitie 
is where А? differs substantially from pî . It has been shown in [1] and [5] 
that the left member of (a) above equals r; , where r; is the multiple correlation 
coefficient of the Jth unique factor on the n observed variables, Thus, if 
hi = .7 and рї = .5, then = (i= .7)/(1 — .5) = .6. In such a case, only 
60 per cent of the total variance of the jth unique factor is linearly determined 
by the n observed variables, Consequently, the unique factor is hardly 
“unique.” Many different sets of unique factor Scores can be found to yield 
the same communality of .7 for the Jth variable and to satisfy all other 
conditions for unique factor scores (1.е., correlate zero with unique factor 
scores for k = j and with all common factor scores), Furthermore, to any 
one such legitimate set of Scores there corresponds another which is equally 
legitimate for the same data and the same jth observed variable, yet the 
correlation between the two sets is only .20. Should r; equal .5 instead of .0, 
then two different “unique” factors can always be found to fit the data 


legitimately for the same Jth variable, and yet correlate exactly zero with 
each other. 


This multiplicity of solutions for 
vations has been analyzed in detail in [ 
multiplicity of solutions holds for eac 
fixed set of common factor loadings, wl 


s (1) and (a) loom large 


unique variables for the same obser- 
5]. It is also shown there how a similar 
h, common factor separately, given any 


hen the left and right members of (1) 


LOUIS GUTTMAN 149 


are substantially disparate for many values of j. The widespread practice 
of trying to name or attach meaning to factors merely by studying factor 
loadings is clearly suspect if the same loadings can be derived equally well 
from radically different sets of factor scores. 

In view of the demonstrated importance of inequality (1) for the com- 
munality problem and for the problem of assigning meaning to common 
and unique factors, it may be desirable to make its proof more readily 
accessible. The purpose of the present paper is to provide a simpler and 
g any matrix algebra. This may help 


more general proof of (1), avoidin 
roof will extend (1) 


clarify the eommunality problem. Furthermore, the p 
ise where the right member belongs to à more general class of co- 
malities. Equality (c) above is but а special case of 
ablished for the more general 
). Simi- 


to the ez 
efficients than commu 
a formula of the same form that will be est 
deviation law. Its proof is what yields (1) as a corollary (ef. [3], p. 293 
larly а simple proof of the formula for 7; will be given in a more general context. 


The ¢-Law of Deviation 
a general law of deviation which is satisfied both 
by the errors of estimate from multiple regressions (anti-image scores) and 


by the unique faetor scores of communality theory. This we shall call the 


£-law, in distinction to other deviation laws defined in [4]. | 
able, and x;; the score of individual 


Let x; denote the jth observed vari i 
ions {1 = Ж, ++, n). Without loss of generality for our purposes, the 


expeeted values (arithmetic means) of all variables concerned may be set 


equal to zero, 
(2) Ea, =0 (у= 


It is possible to define 


1. Ñ, == ,n). 


. as the sum of two uncorrelated 


ays of expressing each 2; 
x ents, so that, 


There are many w: 
ο such compon! 


components. Let y; and z; be any tw: 
9) ан m Yi t Zia 

and 

@ Енг =0 (FG = L2, 2: 
nee of (4) and (3) is that | 

(g = 12 *% ,n). 


An immediate conseque 


2 ° ° 
(5) di, mom, δε 
where 2; correlates zero with 2. 


Of special interest is the class of cases 


whenever j = k, 


(6) Bz it: = 0 (j #2 kij, k = 12777000). 


150 PSYCHOMETRIKA 


Any set of components for the given z; which satisfy (3), (4) and Sabes 
be said to obey the ¢-law of deviation; z; will be regarded as the devian 
portion of z; and y; as the non-deviant portion. Condition (6) alone has been 
called the a-law of deviation [4], so the t-law is a Special case of the a-law. 

Regression errors of estimate and unique factors both can serve as 
deviant components under the f-law. Other types of components than these 
also satisfy this law. However, the variance of errors of estimate of x; can 


never be smaller than the variance of any other z; under the law. "Therefore, if 
η; is defined by 


(7) 1; = o/o, (F = 1,2, ++, n), 
from (5) 
(8) Pi Sm (1-19... n). 


Since Л? will be found to be an example of an 7; , (1) is but a special case of (8). 


The Multiple Regressions 


Let w;, be the multiple regression weight of z, for predieting z; , and 
let p;; be the predicted value, or image, of їн, 
(9) Dii = Σ Ui. (j= 1,2, === n), 
it being understood that a variable is not used to predict itself, or 
(10) ш = 0 (7 = 1,2, ++= ,q). 
If e;; is the error of prediction, or anti-image, of z,, , then 


(11) Z; = p; е. 
As is well known, in order that σε, be a minimum among error variances 
from all possible linear estimates of 7; from the n — 1 remaining z, it is 
necessary and sufficient that е; correlate zero with each v, Where b = j, or 
(12) Ее, = 0 (174 j,k = 1,2, --- η). 
A simple proof of this classical theorem is 
Multiplying (9) through by e 
(10) and (12) Yield the result th 
always uncorrelated, 


given on pages 153-154. 
ii , taking expectations over z, and using 
at an image and its own anti-image are 


(13) πι Geng es о, 


Therefore, the set of all images and anti-images obeys the flaw, for (11), 
(12), and (13) are special cases of (3), (4), and (6) respectively, setting 
pi = Y; and e; = z;. 


LOUIS GUTTMAN 151 


Equalities and Inequalities for the ¢-law 
А Моге important than the fact that multiple regression. components 
satisfy the {Јалу is the role they play in setting bounds for any components 
satisfying this law. The following basic formulas will be established: 


(14) = + GLZ , m) 
and 
(15) б = 85; “Р 001-90 (Je 1,9, = - +”); 


where y; and z; are any components satisfying the ¢-law. e; and p; are uniquely 


determined by the n observed 2; . 
First, note that not only e; , but any 2; 


πο. ss ут). 


correlates zero with p; , 


(10) Е рги: 


(9) through by 2, taking expectations, and 


This follows from multiplying 
) is but a special case of (16). Therefore, 


using (10) and (6). Indeed, (13 
from (4) and (16), 
(17) Ери vi) = 0 


ζῇ em 15,25 πον , 0). 


But from (3) and (11), 


(18) pii — Ysi = Zi» 
so (17) is equivalent to 
(19) Es -e)-79 G7 1,2, s: an). 


If we write the tautology 
(20) 


then (19) and (20) show tha 

components, z; and e; — ŝi · 

using (19) yield (14). 

It is similarly true that 
yi; = Pi: + Qi 7 pij); 


ен = 2: 4- (eis — г), 
опей into two uncorrelated 


t e; can be partiti 
d expectations in (20) and 


Taking squares an 


(21) 
thus y; is partitioned into two uncorrelated components, and (15) follows. 
Explieit proof will be left to the reader. — w" 

From (14) and (15) follow the basic inequalities 


< oj (j= 1,2,::: 0). 


(22) $E or; s о Ξ 
For a given value of J, the equalities in (22) will hold if and only if o2,-., = 
that σ᾽,..., = =; by virtue of (18).] 


σι = 0. [It is of course always true 


vi-pi 


152 PSYCHOMETRIKA 
2 


As is well known, the multiple correlation coefficient p; for г; from its 
regression on the n — 1 remaining observed variables is given by the formula 


(23) рі = ова (}-1,2,-»' ‚ту. 


Thus, p; is but a special kind of η; in view of (7). Dividing the second part 
of (22) through by o}, and using (7) and (23) yields (S). The equality in (8) 
for a given j will hold if and only if σ᾽... = 0 for that κ 


The Case of Common and Unique Components 


It remains to be shown that a communality A? 
the task of proving (1) will be completed. 

The communality problem arises from the 
ponents. A set of n variables w; (j = 1,2 


(2,77 μι) is called by factor analysts 
(following Thurstone) a set of unique factors or components for the observed 
x, if and only if they satisfy the so-called 8-law [cf. 4] 


΄ : 3 n 
is a variety of η), and 


definition of unique com- 


(24) Ewjc;-0 (Qs em WB ssa ; f, 
where 

(25) Gt = rp — τμ, 

and also the y-law [cf 4] 

(28 Шаа Ὁ (зей етю, а, 


j arts of the x; , and the dimensionality of the 
c; has in the past bee 1 i 


26) states that the unique com- 
d among themselves. The communality A? for z; 
ue components is defined as the ratio 


(27) 1} = ete, 


ponents are uncorrelate 
from a given set of uniq 


( 21,2 чы. +). 
Similarly, the uniqueness of x, 


i is defined as oulo, . 
The case of (24) when k = j shows that the w, and e; satisfy eondition 
(4) of the ¢-law. Multiplying (25) through by w. 


Г , taking expectations over 
z, and using (24) and (26) show that rex 


(28) E wami, = 0 (j= k;j, k = 1,2, .. 


Interchanging subscripts j and k in (28) does not 
the u; play the role of the z, in (6). Conditi 


2 yñ); 


change the equality, so 


- Consequently, the c; are 
h (7) shows that A? is a special 


LOUIS GUTTMAN 153 


It might be remarked that the ó-law of deviation can be defined as a 
combination of the 8- and y-laws [4]. Thus, a necessary and sufficient con- 
dition that a set of components be unique factors for the observed z; is that 
it satisfy the ô-law. The above proof shows that the ô-law is a special case 
of the ¢-law. Any set of components satisfying the à-law must also satisfy 
the claw. But not all components satisfying the ¢-law need satisfy the 
é-law; for example, the e; and p; regression components satisfy one law but 


not the other. 


Minimizing and Maximizing Properties of the Anti-Images 


It is a curious fact that among all 2; satisfying the ¢-law, €; has the 
is the smallest possible variance of errors 
aining 2 — 1 observed z; . In 


largest possible variance, while Pe; 
a linear function of the rem: 
the other, it is a minimum. 

a simple algebraic proof that condition 
oblem, along 


for estimating v; as 

one context, σὲ, 15 ἃ maximum; in 
It may be instructive to give 

sufficient for the minimizing regression pr 

nt concerning the ¢-law. 

al numbers (weights), except that 


(12) is necessary and 
the lines of the previous argume 
Let w% be an arbitrary set of re 


(29) "T Gabe ,n. 
Let p* be defined by 
(30) pt = Y wir» 

kel 


and οὔ by 

(31) ef m їн Pî . 
ary estimate of ἃ; 
tisfies (12), multiply 


as a linear function of the remaining 


Thus, p* is an arbitr 
Ῥ 15 ап 1 (30) through by ii and take ex- 


x, . Given that e; sa 
pectations over š to see that 
(j= ος. m). 


(32) Ерен = 0 
From (13) and (32) 


(33) Ee, (pis τ pit) = 


But from (11) and (3D, 
(34) pa — Pi 
so (33) becomes š 


(35) вее ει) = 


154 PSYCHOMETRIKA 


Then consider the tautology 
(36) ей =e, + (et — ej); 


this expresses e* as the sum of two uncorrelated components according to 
(35), whence 

(37) Schade (fele sn). 

Clearly then 

(38) ον Ἕν, (= Ë... ih 
the equality holding for any jif and only if Taji 
necessary and sufficient for с? to be a minimum. 

This proof of (38) is not only simpler than the usual one w 
the partial differential calcul 
has been established here as 
what happens when the best 1 


ει = 0. Hence, (12) is both 


hich involves 
us, but is more complete, Sufficiency of (12 
well as necessity, with exact; formula (37) for 
inear estimates are not used. 


The Determinacy Problem of the Common ar 


id Unique Components 
The (-law alone does not lead to a unique definition for deviant com- 
ponents for the z; . Further restrictions are needed for this, Tf it is required 
additionally that y; be a linear function of then — 1 z, for which k Z j, 
then the only possible y; and z; are p, and €; , respectively (j = 1,2, ος qim) 
This can be seen, for example, by regarding inequalities (22) and (38) simul- 
taneously, 


(39) в: 


ΙΛ 
S 


Only e; can be an еж 
if and only if the v 
question vanishes, 
On the other hand, if it is required instead 
among themselves—implying that the 6-law 
the components; infinitely many satisfactory Sets remain, as is well known 
in common-factor theory. Furthermore, the various Dossible components 
termined by the observations. 
` functions of the observed 
from the observations. Let us now 
ons, and into the predictability of 
f-law components, It again turns out that the special case of images and 


all prove that, if r; is the multiple 
from the η ibi 
cluded in the regression—and if 0:; > 0, then 


"w fes jn). 


and a z; simultaneously, Since 


an equality in (39) holds 
ariance of the difference of e; 


from the component in 


that the z; be uncorrelated 
holds—this does not pin down 


— where now z; is in- 


(40) Й =, de ts 0). 


LOUIS GUTTMAN 155 


If o., > 0, it is always true that σε, > 0, from (22), so the denominator on 


i 


the right of (40) cannot vanish. Only the case of non-vanishing σ., is of 


interest. for our regression problem. If σ., 2 0, then (40) shows that r; — 1 


if and only if e, = σε, , Which implies, as we have repeatedly seen from (14), 
that o,,-., = 0 or z; be essentially the same as e; . In other words, we shall 
have 


THEOREM 1. Among all possible deviant components satisfying the (-law 
and which have positive variance, essentially the only ones which can be linear 
functions of the n observed variables are the anti-images. 
explicit formulas for the regression of each z; 
Il be established. Let w; be the multiple 
and let z;; be the predicted 


To prove these results, 
on the m observed variables wi 
regression coefficient of x, for predicting 2; , 
value of 2;; , 

(41) Tj; = 3 euntis . 
k=1 


It will now be immediately seen that 7; is exactly related to e; by the formula 


(42) т = “ён (aa > 0: 

σει 
The right member of (42) is clearly a linear function of the x, , and so must 
be of the form (41). Then all we need to show according to the proof in the 


last section—which holds for any multiple regression problem—is that the 
resulting from (42) correlate zero with each 


errors of estimate 2; — 7i 
predictor, analogously to (12), 
(43) όν meg t Gee 2, , 9 


When j z k, (43) obviously holds from (6), (12), and (42). So we need verify 


only t art wher ; = k. " " 
mane B saq by z; , taking expectations, and using (4) 
show that 


(44) Bass πμ ε 


As a special case of (44), since 6; is a2; , 
(45) Ее: = oe; (9 = 1,2, n. 
Multiplying (42) through by aj; , taking expectations, and using (45) yield 


(46) Етнён 0 Gs ο ας n). 


Then (43) follows from (44) and (46). 


156 PSYCHOMETRIKA 


Squaring both members of (42) and taking expectations show that 


(47) ot, = c/o, (ys 19». 1 Ws 


Analogously to (23), the multiple correlation coefficient. for 


predicting z, 
is given by 


(48) т σ.σ, {(1-1,3,-..,ῃ) 


Therefore, dividing (47) through by сї, and using (48) yield (40). : 
It may be noted that (12) and (40) imply that the ω,, themselves in 
(41) can be related directly to the ш» of (9) by the formula 


(49) @ = =й (J hjjkc, 2, -- 


= o 
However, in contrast to (10) for j = k, 


(50) ززه‎ = r; G= 1,2, -.. ‚лу, 


While z; is uncorrelated with each x, for which k = j, from the definition of 
the ¢-law, nevertheless w;« in general does not vanish when J = k, according 
to (49). The variables v, with which 2; is uncorrelated act as "suppressor" 
variables in aiding x; to estimate By 

Formulas such as (40) and (49) have b 
several writers, largely by the use of determ 
for the special case of unique factors (see [ 
"They hold more generally for any 
that each of the infinite 


cen established previously by 
inants and/or matrix algebra, 
5] and the references thercin). 
2; of the £-law. It is а remarkable fact 
ly many possible 2; (of positive variance) have 
estimates from the n observed variables that differ from each other only by 
the constant of proportionality n. according to (42) and (40). As nol 
ОГ 7, — 2; , it must be that r, — €; according to (42) and henc 
only way for a deviant component (of positive vari 
determinate under the (-law is that it have the corr 
image as a limit [ef. 3]. 

Analogous considerations hold for 
not go into detail here; the interested 
discussion for the c. 


ee;— бна The 
ance) to be linearly 
esponding total anti- 


the common components, We shall 


reader may wish to see the related 
ase of the B-law in [5]. 


Compact Formulation of the t-Law 


In closing, it may be useful to state the алу in a more compact form. 
l'or the given n observed 7; 8 set of variables 2j; (7 = 1, 2, +++, n) will be 
said to satisfy the c-law if and only if 
(51) Bea = бае, (ko 1,2, --- m), 
where ó,, is Kronecker's delta, 


1 d a 
(52) бу ы d lh em. ass 


* wifi) 
0 jzk 


LOUIS GUTTMAN 157 


PR... үе ер > (3), (4), and (6) combined. Condition (6) is directly 
а 51), so all that is needed is to consider (51) for the her 
j = k. Define y; by the identity те 


(53) yii = Xu 
Multiply (53) through by z;; , take expec- 
hat (4) is satisfied. Hence, (51) i jent 
for (3), (4), and (6). That (51) їз also necessary follows from Г) таш 
M The fact that the whole concept of deviance is wrapped up in the z | аз 
m (51 ), without direct reference to the non-deviant y; , may help explain diffi- 
ш attending the communality problem in common-factor analys 
Past attempts to solve the problem have largely focused on the j;—that is, 
the c,—rather than on the z; (u;). The dimensionality of the c; has often 
been taken as a point of departure, 50 that communalities have often been 
thought of in terms of reducing ranks of correlation matrices. But many 
aspects of the problem can be studied without considering dimensionality 
at all, as in the present paper. In particular, communalities can possibly be 
uniquely defined by considering only the ¢-law, and requiring only determinacy 
of the z; in the limit as n 2®. For cases where such determinacy holds, 
it must be that A is the limit of p; as n —9- No preliminary considerations 
of rank are required for such a conclusion, but analysis only of the law of 


deviation involved. 
Implications of lack of determi 
analyzed in [5] for the 8-law, including the ô- 


This, then, is equivalent to (3). 
tations over š, and use (51) to see t 


nacy have been rather completely 
law of common-factor theory. 


REFERENCES 

UJ] Guttman, L. Multiple rectilinear prediction and the resolution into components. 
Psychometrika, 1940, 5, 79-99. 

2] Guttman, L. A basis for an: -retest reliability. 
. 255-282. 

3] Guttman, L. Image th 
1953, 18, 277-200. 

3] Guttman, L. A. new appro: 
Mathematical thinking in t 
pp. 308-9. 

[5 Guttman, L. The determinacy of fac 
А basic problems of common-factor theory. 
6] Guttman, L. “Best possible" systematic es 
1956, 21, 273-285. 

Roff, M. Some properties 
1936, 1, 1-6. 


ilyzing test Psychometrika, 1945, 10, 
cory for the structure of quantitative variates. Psychometrika, 
alysis: the radex. In P. F. Lazarsfeld (Ed.), 
Ill.: The Free Press, 1954; esp. 


ach to factor an 
es, Glencoe, 


he social ‘science’ 
mplications for five other 
1955, 8, 65-81. 

lities. Psychometrika, 


tor score matrices with i 
Brit. J. statist. Psychol., 
timates for communa 


tiple-factor theory. Psychometrika Я 


f the communality in mul 


- 


Manuscript received 9/4/56 


Revised manuscript received 11 /21/56 


———  ῳ. ΓΝ ''''' ӨЫ 


PSYCHOMETRIKA —VOL. 22, No. 2 
JUNE, 1057 


A GENERAL LEAST SQUARES SOLUTION 
FOR SUCCESSIVE INTERVALS* 


RTRUDE W. DIEDERICH 


PRINCETON UNIVERSITY 


AND 


SAMUEL J. MESSICK AND LEDYARD R TUCKER 


PRINCETON UNIVERSITY AND EDUCATIONAL TESTING SERVICE 
als is presented, 


eral least squares solu essive interva 
ale values, dis- 


procedures 
and category 1 

d into the derivation, 

i i ix of incomplete data, 


rpical experimental matrix 1 
s well as to the rarely occurring 


A gen 
along with iterative 
criminal dispersions, 
weighting were incorporate 
without loss of rigor to the t € 
i.e., to a data matrix with missing entries, 1 
matrix of complete a. The use of weights also permits 
for variations in the reliability of estimates obtained from the data. The 
computational steps involved in the solu merated, the amount 
of labor required comparing favorably with other procedur A quick, yet 
aceurate, graphical approximation sugg es derivation 


is also deseribed. 


_ Since Thurstone first developed the scaling method of successive 
intervals, it has appeared in several essentially identical forms under various 
names, such as equal diseriminability sealing [6] and graded dichotomies 
[1]. The procedure was fi psychological sealing method by 

ented by Thurstone 


rst published asa 
Saffir [15], the basic rationale having been previously pres 
in his absolute scaling of PSY 


chological tests [16, 18]. 

The experimental procedure for the method of ; 
requires n stimuli to be sorted into (Ë + 1) categories on some attribute 
continuum. This procedure yields a frequency distribution for each stimulus 
Over several of the categories. The basic consideration in successive intervals 
Sealing is whether or not these frequency distributions can be simultaneously 
Converted to a common distribution, allowing unequal means and variances, 
on the same base line. The means of the converted distributions would then 
correspond to stimulus scale values and the standard deviations to what 
Thurstone has called *diseriminal dispersions” [17]. Scale values for category 
boundaries are also obtained from the method of success! 


f successive intervals 


ve intervals, thus 
nä R rinceton University, the Office 
arch was joint, supported in d eod the National Science Foundation 
D - , 5 i 

Educational Testing Service. 


159 


. "This rese 
οἱ Naval Research under contrac 
nder grant NSF G-642, and in part by 


160 PSYCHOMETRIKA 


permitting estimates of the size of categories rather than assuming them 
to be equal as in the method of equal-appearing intervals [19]. 


Solutions to the Scaling Problem 


Successive intervals solutions for the n stimulus values have been 
suggested by Saffir [15], Guilford [9], Mosier [13], Bishop Bl. Atineave [1], 
Garner and Hake [6], Edwards [3], Gulliksen [10], and Rimoldi [14]. Some 
of these articles also offer solutions for the n discriminal dispersions and the 
k category boundaries. The procedures vary in computational routine and 
with respect to certain restricting assumptions, but they 
equivalent. These procedures involve obtainin 
times stimulus 7 was placed below the gth category boundary, /, , and these 
cumulative proportions are then usually converted into normal deviate 
values, z;, . Various successive intervals solutions presented in the literature 
have used normal curve transformations, but any similar function giving à 
one-to-one correspondence between р,, and Zi, could be used (see [10]). 

Category boundaries on the attribute scale may be expressed in terms 
of normal deviate values as follows: 


0) 


where z;, = the normal 
proportion, 
the upper boundary of the gth category, 
the scale value of stimulus i, and 
s; = the discriminal dispersion for stimulus 7. 
This equation is what Torgerson calls a special case of the Law of Categorical 
Judgment [20]. Algebraic solutions for m, , s; , and t, can be obtained from 
this relationship by arbitrarily choosing one of the 5; values as a unit and 
one of the m, values or their average as an origin. 

Gulliksen [10] derived an explicit least squares 
1, by minimizing the following error term: 


are essentially 
g the proportion, p;, , of 


l, = m, cts, 


deviate value corresponding to a cumulated 


II 


solution for m, , s, , and 


(2) Е = р 5: Ss (m; + Sii, — 1), 


i 
i=l g=] 
where b is an arbitrary scale factor. 


The following restrictions, which fix 


| ietior an origin and a unit, were attached 
to the function to be minimized by Lagrange multipliers: 


(3) > 1, = ka 
and " 
(4) > G= k(a° + b°), 


G. W. DIEDERICH, 5. J. MESSICK, AND L. R TUCKER 161 


where K is the number of category boundaries, 1.ο., one less than the number 
of categories. These restrictions place the mean scale value of the category 
boundaries at a and their standard deviation at b. 


A General Least Squares Solution 


This paper is concerned with a generalization of Gulliksen’s successive 
intervals solution [10]. In an attempt to obtain a least squares procedure that 
would apply equally well to data with either complete or incomplete overlap, 
a weighting system W he present derivation. Thus the 
“incomplete overlap” handled without loss of rigor 
by assigning weights ο tries. As it turned out, the 
present iterative solution tational routine that is not 


excessively laborious and ocedures are appropriate 


as incorporated into t 
situation could be easily 
f zero to the missing en 
also involves a compu 
for which punched card pr 


[12]. 

The derivation proceeds by minimizing the following error term: 
n k 

(5) E- 2 Y X ui (m + 84» — [^ 
jel gel 


that may be chosen in any fashion as long as Wi, = 0 


Qorp = 1. . . 
d unit were specified by setting the weighted 


boundaries at a and their standard deviation 


where w,, is a weight 
and w;,2;, = 0 when p = 
An arbitrary origin an 


mean scale value of the category 


at b: 

(6) Σω — d) bs i, = 0, 
(7) Σα = a) B wi, = Wb, 
where W = > ET a 


icti ighted case the definitions used by 
These restrictions gene eig! e 
Gulliksen in the unwell! (3) and (4). TE = ipis de 
termined only within a linear transfor nation, 1 may be ай ang 
values desired, e.g. ἃ convenient possibility for the origin, а, mig 
ви ΠῚ . A 

and for the mate unit, b, which must be positive, might be unity. " . 

Using two Lagrange multipliers, Y an ол, the restrictions setting origin 


and unit may be included in the error ter! 


а š ME ER: ) 
LS 5 ui im. А адь — Ш? + a А > Wis Wa 


@ ospa 
= "P ё Σ wi, μα. i| 


ralize for the w 


m as follows: 


162 PSYCHOMETRIKA 


Except for the weights, w,,, (8) is identical to the term minimized by 
Gulliksen, see (2), (3), and (4). His solution, then, is the special case of the 
present one in which all of the weights are equal to unity and ΣΣ Win = k 
and 972 w,, = n. Because of this restriction to unit weights, only data with 
complete overlap could be considered. Equation (8) is also similar to the 
term minimized by Tucker [22] in developing a least squares solution to the 
normal ogive model for categorical data, which is formally equivalent to the 
successive intervals situation (see [21], chapter 13). Tucker’s solution, like 
the present one, involves weights and is iterative, but instead of minimizing 
the sum of squared differences between theoretical and estimated /-values, 
he minimized the sum of squared differences between theoretical and observed 
z-values. 


The differentiation of Q with respect to each of the m, in turn yields the 
n equations 


19 NE š 
(9) EA =й È wim: + sz; — t)(+1) (i = 1 ::: m. 


?=1 


After expanding and setting the partial derivative equal to zero, the solution 
for m; ean be written as 


k 


k 
>= Wisl, — 8; 2; 10266 
(10) m, = — τ s ` 

Συ, 


Q is now differentiated with respect to each s, in turn to yield the n 
equations 


1 1 < 
(11) 955 Ἡ Σ ш„(т, + sz;, — ει) G 0). 


Expanding, setting the partial derivative equal to zero, and substituting 
the value of m, from (10), 


Emel Bn) - (Emad) bon)‏ و 
wzi) > wa) = (È ws)‏ =[ 


For the purpose of parenthetical comment upon the form of s; , (12) 
can be rewritten in the following manner: 


У wilt, = DG, — 5) 
(13) соу 


> Wilo — ZY ' 


" 


G. W. DIEDERICH, S. J. MESSICK, AND L. R TUCKER 163 


where 


x 
25 Uis pP Wisto 
7 


= and Ё 
Уш, 
7 


It is apparent from the form of (13) that s; 
It is the coefficient for the regression of { on 2 
graphical representation of the data, which will be presented in a later section. 
It is also interesting to note that Tucker’s solution [22], which minimized 
the sum of squared errors in the z direction instead of in the { direction, 
involves the other regression—the regression of z on t. 

Utilizing the value of s, obtained in (12), it is a 
the formula for m; in terms of /, as follows: 


" (x e» о.) — (= Te» mas) 
4) Wa = n k n 2 
(> mb v.) = (> "m 


is the slope of a regression line. 
. This immediately suggests a 


lso possible to rewrite 


2 


ater section, this is the form of m, which it will be con- 


putational routines. | 
ach /, in turn yields the / equations 


м, È ws 


ως tee D. 


As will be seen in a 1 
venient to use in com 
Differentiating Q with respect to е 


ο... DL Wie = 


i i 
09 σοι b 4 


Expanding and setting the partial derivative equal to zero, 


(16) 1, Y Wu — 5з Wi (Mi + Siis 


TI PLE P θα το 


Define 

> u (mi +p διΖιο) 
μάν»... 
17 y Eee 

(17) S du 

«ying both sides by 3 was and 


Rearranging the 
substituting the definition 


(18) 


The solution for ἐν can 


a = vt 


now be written as 


(19) وا‎ = 1 - υγ 


164 PSYCHOMETRIKA 


Summing (16) over g and utilizing the definition of (6), 
k n 
(20) Wa — ΣΣ wim: + szu) + WU — Шау = 0. 
Consider the term > У)" wi, (m, + s;2;,). By interchanging the order of 


summation and by inserting the values of s; and m; given in (12) and (14), 
respectively, this term may be written as 


n k k 
> (n. Σω. + s, > vns) 
i 9 g 


ο (E οἱ 


= > n 
(21) i (> ο.» 2 wa) - (£ шь) 
= ΣΣ Wil, . 


Utilizing the definition of an origin given in (6), the term can be further 
simplified to 


n k n k 
(22) ΣΣ u, (m, + szi) = ΣΣ wit, = Wa. 
Now, from (20), 
(23) À = ay. 


It should be noted in passing that (22), in terms of the >, of (17), indicates 
that the weighted mean of v, is А 


Σο, Σω, 
(24) ? = = g, 


σοι Wis 


If the above value of À is substituted into (18) and (1 — b? y)a is sub- 
tracted from both sides of the equation, then (18) becomes 


(25) (1 — bY, — a) = v, — a. 
Squaring both sides of (25), multiplying through by 2J? w;, , and summing 
over g, 


(26) (== υγ)” > (t, — a)° > Wis = > (0, — а)? > Wig: 


Utilizing the definition of a scale unit from (7), (26) becomes 


f 


G. W. DIEDERICH, S. J. MESSICK, AND L. R TUCKER 165 


(27) a = ον = 


Using the value of À found in (23), the solution for {, given in (19) may 
be written as ὦ 


u, — aby 0, –а 
„= — = — 
(28) 1S με 


+a. 


Substituting the value of (1 — by) from (27), the solution for t, becomes 


(29) t, m a RM ποὺ T a. 
Σύ, — a” Di Wis 
Wb 


It can also be shown that Æ, the sum of squares of errors, may be repre- 
sented in terms of у as follows: 


(30) Е = WU. 


The Iterative Procedure 


Since the /-scale of successive intervals is determined only within a 
linear transformation, the origin, a, and the scale unit, b, may be set at any 
values desired. The values most convenient for computational routines are 
a = 0 and b = 1. With such a placement of origin and unit, the restrictions 


given in (6) and (7) may be restated as 


k n 
(31) > 2j wi, = 0, and 
g i 
k n 
(32) Σο Уш, = W, respectively. 


The formula for category boundaries given in (29) may also be rewritten in 
terms of this origin and unit as follows: 


(33) = z z . 
$5 > v; Σ Wig 


Now, (12), (13), (17), and (33) may be used to set up an iterative pro- 
cedure to obtain convergence values for δὲν т) and t, . If some 
initial estimates, һа, of the category boundaries were available, (12) could 
be solved to obtain initial estimates, δει » of the discriminal dispersions. Of 
course, the initial /-estimates, tı , should first be converted to meet the 


166 PSYCHOMETRIKA 


icti if some set of k numbers, v,, , is available 
trictions of (31) and (32). Thus, i 
"3 DEB the category boundary values, before they may be used in the 
above solution, they must first be converted to meet the restrictions of (31) 
and (32) as follows: 


(34) ta = 0л — 01 


, 
g, 


τ 


where 


k n 1 k - «ἃ n 
δι = y У а 2; Wis and c, = «κ Σ (o — 8) > Wey < 


Initial estimates of s, can then be obtained from (12) as follows: 


"INC REGN ON 
(Е-Е) (Е. 


If a subscript o is introduced to indicate the ath ο 
cedure, (35) can be rewritten а 


ycle in the iterative pro- 
s a formula for the ath estimate of s, : 


k k 
(36) Sia = A; X Wiz, — В; ὃ Wiss; 
where 
k 
Συ, 
(37) А; = τ — z T 
(x vt Σ v.) - (x оа.) 
апа 
k 
25 Wig, 
(38) В; = 2 


(È vat (3: v.) _ (= Wide) 


Having’ found estimates for /, and s, , the oth estimate of the seale values 
may be obtained from (14) as follows: 


k 


k 
(39) mia = C, P waka — B, Y uote 


ig у 


where 


(Е) E 


(10) С; = 


G. W. DIEDERICH, S. J. MESSICK, AND L. R TUCKER 167 


Note that the components of (37), (38), and (40) are obtainable directly 
from the data; they are the same for all eycles of iteration and need be 
computed only once for the entire procedure. New estimates of r, may be 
obtained from a formula analogous to (17): 


2; WigMia + » W i gSiatio 
i i 
= ë 
Συ, 
i 


(41) 0, (a4) 


A new estimate of the /-scale may now be found by using (33) as follows: 


(42) litari Votati 


Ts; " ` 
aH 2i een 2; Wis 
2 ç 


The above procedure may be repeated by inserting this value of έναν) 
into (36) to obtain s,(,,,; , which in turn may be used to obtain Mica» 
and, subsequently, {ανω . This cycle may then be iterated until two successive 
estimates of {, are as similar as desired, i.e., until [1 сазо — fea) is negligible. 

The one step remaining to be considered before the above iterative 
procedure may be applied in practice is the initial estimation of the /-scale. 
One obvious starting point might be a set of equally spaced numbers, such 
as the integers from 1 to k, to which the conversion of (34) had been applied. 
By using such equally spaced ¢,, values, a set of *equal-appearing" intervals 
is used as the starting point for iteration to successive intervals. It may be 
possible, also, to increase the rate of convergence in the iterative procedure 
by doubling or tripling the difference between successive estimates, ie., 
instead of using /,(,.,; on the (a + 1) trial, use азо = toa + 2(brarn στ L). 

A cycle or two, and in some cases perhaps several cycles, may be elimi- 
nated from the iterative procedure by using a computationally simple linear 
solution for ¢, as a first estimate. Опе of the simplest methods for estimating 
t, has been suggested by Garner and Hake [6] and by Edwards [3] and involves 
averages of successive differences in z,, values. Such averages are estimates 
of (i, — 1,1) provided the discriminal dispersions may be assumed equal. 
Torgerson [21] also gives a simple algebraic ratio solution for ἐν which does 
not require equal s; . Any of these algebraic solutions may be used to obtain 
initial estimates for the iterative procedure, but the labor involved might 
turn out to be as great as that in the cycle or two eliminated. 

Some comment is appropriate at this point concerning the weights, Wis , 
above least squares solution. It will be recalled that the only 


involved in the Я | 
ed upon the choice of these weights was that 


restriction plac 


ш, = 0 and 02, = 0 when p=0 o р= 1. 


168 PSYCHOMETRIKA 


If p equals neither zero nor one, the weights may be set at any values desired, 
eg., Wis may be set equal to unity for 0 < p < 1. И 

Another possibility would be the Müller-Urban weights, z°/pq, where 
{x is an ordinate of the normal distribution corresponding to a proportion p 
and q = 1 — р; а recent derivation of these weights is given by Finney [4]. 
The Müller-Urban weights are particularly appropriate for successive 
intervals data, since they have the combined virtues of weighting directly 
in proportion to the rate of change of p with respect to z and inversely to the 
variance of the proportions [8]. Since the reciprocal of the variance may be 
identified with quantity of information [5] and is directly related to the 
reliability of a proportion, these weights would also be directly proportional 
to reliability and to the information available from the observations. It can 
also be shown that within an approximation the Müller-Urban w 
the proper values for weighting normally transformed scores in 
their variance (see [11], p. 206). 

However, it is possible to use a simpler set of Weights than z°/pq, without 
sacrificing completely the differentiation betw 


een reliable and unreliable 
Zis Values. For instance, one possible rule for weighting would be to assign 
zero if the corresponding proportion contained less than some specified 


fraction, 1/7, of the maximum possible information (corresponding to 
р = .5) and unity if it contained more than (1/ 7)th the maximum information. 
Or, all | z,, | > c could be weighted zero, and all | Zio | € c could be weighted 
unity; such a rule with a value of c — 2 has been found to be convenient in 
practice [7, 8]. The use of a simple set of unit and zero weights also simplifies 
some of the procedures involved in the above iterative solution, 


eights are 
versely to 


Summary and Illustration of Analytical Procedures 

The analytical procedures involved in the abo 

for successive intervals will now be summarized, 

example will be used to illustrate the computatio: 

1. The experimental method of successive ir 

(1 to k + 1) into which each of n stimuli was plae 

These data may be summarized into an n X (k 

of which, fi, , represent the number of times the 

the gth category. By cumulating the frequencies i 

that each entry now represents the number of times the ¿th stimulus appeared 

below the gth category boundary, t, , a set of cumulated frequencies, F,, , is 

obtained, which can be considered to be the Starting poin 
intervals analysis. 

2. The cumulated frequencies are then converted into proportions, 

Pis , and then to normal deviate values, z;, . For the purpose of illustrating 

computational procedures, consider the Set of z;, "values presented in Table 

1. The four scale values, m, , four discriminal dispersions, s, , and three 


ve least squares solution 
and an errorless numerical 
nal routine. 

ntervals yields the category 
ed by each of N individuals. 
+ 1) table, the cell entries 
¿th stimulus was placed in 
n each row of-this table so 


t for successive 


A 


у 


a 


G. W. DIEDERICH, S. J. 


MESSICK, AND L. R TUCKER 


169 


5 


TAE 


5 


First Iteration in the Numerical Exazple, Beginning with Equally Spaced t.) 


=—_——— 
ашы: `a ^ B, a н at δ 9ü DE 
س‎ sss E 
l tg 26422 5/50 2500 .5000 -2.17321 «21005 265972 -1.16022 t= -1.08759 
2 to) = 25.2381 .0952 «258 = 54329 1.26983 1.06913 — -.53575 typ = «02369 
2 = 0784 0196 .25%9 „62095 6.75251 +51725 02593 tga = 1.51282 
ἃ 2128  ..1277  .2766 2.09562 5.09267 1.01085 1.02566 
TABLE 5 
Second Iteration in the Numerical Example 
δια Ба з 
1 талма 5600 — -1,0859 — ty, = -1.06948 
8 2.53516 4.26683 1.06693 ta = 00720 
5 55015 6.89555 — .52965 100570 ty, 1.59195 
1 2.12525 5.68080 1.05467 1.05788 


category 
strietions of (31) an 
of scal 


e values and category boundaries, 


iterative solution may be illustrated. 


3. A set of weights, Wis , ani 


boundaries, f, , Which exactly fit these Z; values under the re- 
d (32) are also given in Table 1. Knowing a "true" set 


the convergence of the above 


d an initial estimate, /,; Οἱ the category 


170 PSYCHOMETRIKA 


boundaries must now be determined. For the present example, it was decided. 
to use the weights given in Table 1; they were assigned so that 


[ο for [ει | > 3:0 
Wig = Jy for 3.0 > | > 2.0. 
k for „| «220 


It was also decided to use an equally-spaced scale as a first estimate 
of t, . Accordingly, using the integers 1, 2, and 3 as р, , the conversion of 
(34) produced as /,, the values — 1.16422, .15523, 1.47469. It should be noted 
that if the “true” ¢, values given in Table 1 were used as first estimates in 
the present iterative procedure, they would be exactly reproduced at the 
end of one cycle. 

4. Now, the coefficients 4, , B, , and C; may be computed according 
to (37), (38), and (40), respectively; these values are presented in Table 2, 
along with the values of 


k k 
X шь and 2 Шш. 
° g 


5. Sufficient information is now available to solve for first estimates of 
the discriminal dispersions, s;, , using (36) (see Table 2). 

6. Now, first estimates of the scale values, m;, , can be obtained, using 
(39) (see Table 2). 

7. New estimates, /,, , of the category boundaries may now be found 
from (41) and (42). It will be noted that {ρα given in Table 2 is closer to the 
"true" scale than t, was. 

8. In order to iterate this Solution, new values of Уу wil. and 

ар, о, must be computed. Using these values, (36) and (39), respectively, 
may be solved to obtain s,, and mj» . Then, (41) and (42) may be used to 
obtain ἕναν the third estimate of the t-scale (see Table 3). It should again be 
noted that the estimates of 1, at each successive cycle are approaching closer 


and closer the “true” t, scale. This procedure may now be repeated until 
two successive ¢-estimates are as similar as desired. 


A Graphical Successive Intervals Scaling Procedure 

It was seen from (13) that s, is the regression coefficient f 
of t on z. This suggests a graphical solution for successive intervals, which 
will be summarized below; the procedures to be presented bear some similarity 
to the graphical methods of Mosier [13] and Garner and Hake [6]. 

1-3. Steps 1 through 3 of the graphical procedure are identical to the 
corresponding steps of the above analytical procedure. In order to utilize 
the graphical method, a first estimate, #,, , of the category boundaries must 


or the regression 


| G. W. DIEDERICH., S. J. MESSICK, AND L. R TUCKER 171 


be available, along with normal deviate values and their corresponding 
weights. 

4. The estimated {μι values are then marked off as the ordinate of a 
graph with z values used as the abscissa. The fy, values are horizontal lines 
that hold for all stimuli, so several plots can be made on one graph (see 
Figure 1). For each stimulus, the z;; values are plotted at the appropriate 
a points, i.e., for stimulus 2 in the above numerical example, points would 
be plotted at (( = — 1.164, 2 = —.5), ба 155,2 = .5), and (л = 1.475, 
z = 2.0), as illustrated in Figure 1. Weights can be applied in the graphical 


P 


nm 
t Scale 
t. 1.47 ου ο s 
1, 
ү 
LE. L аалы 
| = 
09 ی‎ μπω 
Z=-2 Z=-1 2-0 7=1 2:2 z5 
1 
FIGURE | 25692 
Grophicol Solution for Numericol Exomple 
Beginning with Equally Spaced tg 


procedure by clustering around each point a number of dots proportional 
to the corresponding weight. A straight line can now be fitted to the points 
by eye, giving more emphasis to those points with bigger dot clusters m 
determining the slope of the line. 

| 5. The equation of each of these lines can be written as 

| L. = Mia + Sialis 

(43) ga 

| The slope, δια, Of each line is the ath estimate of the diseriminal dispersion, 


and the intercept, m, |, When 2 = 0 ix the ath estimate of the scale value. 


172 PSYCHOMETRIKA 


These intercepts and slopes can be read directly from the gr 
need not be recorded until the final iteration. И 

6. In practice, the straight lines fitted to the plotted values will r: 
cross every point; each point will usually deviate from the line by some 
amount, the amount of this deviation in the vertical direction representing a 
scaling error (see Figure 1). The vertical projection of a plotted point on the 
fitted line produces another point, r, , Which, since it lies directly on the line, 
represents a theoretical or fitted estimate of 1, (see Figure 1). For a given 
category boundary, there are n fitted estimates fr, , one for each stimulus. 
If the ordinates of these £, values are recorded 


‚ Weighted averages of the 
ordinates can be used to obtain a new estimate of 1, as follows: 


п 
29 wis; 
7 


aph, but they 


arely 


(44) — = Us(as1) ; 
Συ, 
i 
= Usos) — Das 
(45) I b iesu = —— tet 
σεις +) 
where 
x 1< Ы 
аз) = ур У earn Συ, 
9 i 
and 


I = a 
(a+) = an » tes) = Been! Σ' Wig . 


The only value, then, that need be read fr 
iterative cycle to another is ἕνα. The slo 
to discriminal dispersions and scale ν 
the final iteration. 

7. This new estimate of {, may now b 
with z values marked off as the abscis: 
beginning at step 4 until two successiv 


om the graphs in going from one 
pes and intercepts corresponding 
alues do not need to be recorded until 


plotted as the ordinate of a graph 
Sa. The cycle may then be repeated 
€ t-estimates are as similar as desired. 


REFERENCES 
[1] Attneave, F. A method of graded dichotomies for the ali j hol. 
Rev., 1949, 56, 334-340. Sealing of judgments, Psycho 
[2] Bishop, Ruth. Points of neutrality in social attitudes of deli -de- 
linquents. Psychometrika, 1940, 5, 35-45. ot delinquents and non-de 
[3] Edwards, A. L. The scaling of stimuli by the method of sui cessive i 1. 
Psychol., 1952, 36, 118-122. ccesstve intervals, J. арр 
[4] Finney, D. J. Probit analysis. New York: Ca 


mbridge Univer. Pre , 1952. 
[5] Fisher, R. A. Theory of statistical estimation. TE 


Proc. Cam. Phil, Soc., 1925, 22, 700-725. 


x 


M 


- = 


о со 


10 


13 


14 


22 


5. W. DIEDERICH, S. J. MESSICK, AND L. R TUCKER 173 


Garner, W. R. and Hake, H. W. The amount of information in absolute judgments. 
Psychol. Rev., 1951, 58, 446-459. 

Green, B. F. Attitude measurement. In G. Lindzey (Ed.), Handbook of social psy- 
chology. Cambridge, Mass.: Addison-Wesley, 1954. 

Guilford, J. P. Psychometrie methods. New York: McGraw-Hill, 1936. 

Guilford, J. P. The computation of psychological values from judgments in absolute 
categories. J. exp. Psychol., 1938, 22, 32-42. 

Gulliksen, H. A least squares solution for successive intervals assuming unequal 
standard deviations. Psychometrika, 1954, 19, 117-139. 

Kendall, M. G. The advanced theory of statistics. II. London: Griffin, 1948. 
Messick, S., Tucker, L., and Garrison, Н. A punched card procedure for the method 
of successive intervals. Princeton: Educational Testing Service, Research Bulletin 
55-25. 

Mosier, C. I. A modification of the method of successive intervals, Psychometrika, 
1940, 5, 101-107. 

Rimoldi, H. J. A. and Hormaeche, Marceva. The law of comparative judgment in 
the successive intervals and graphic rating scale methods. Princeton: Educational 
Testing Service, Research Bulletin 54-5. 

Baffir, M. A comparative study of scales constructed by three psychophysical methods. 
Psychometrika, 1937, 2, 179-198. 

Thurstone, L. L. A method of scaling psychological and educational tests. J. educ. 
Psychol., 1925, 16, 433-451. 

Thurstone, L. L. A law of comparative judgment. Psychol. Rev., 1927, 34, 424-432. 
Thurstone, L. L. The unit of measurement in educational scales. J. educ. Psychol., 
1927, 18, 505-524. . ки 
Thurstone, L. L. and Chave, E. J. The measurement of attitude. Chicago: Univer. 
Chicago Press, 1929. 

Torgerson, W. S. A law of categorical judgment. In L. S. Clark (Ed.), Consumer 
behavior. New York: New York Univer. Press, 1954. Е 
Torgerson, W. S. Theory and method of scaling. Social Science Research Council 
(to be published). 

Tucker, L. R. A level of proficiency scale for a unidimensional skill. Amer. Psy- 
chologist, 1952, 7, 408 (Abstract), 


Manuscript received 5/1 0/56 


Revised manuscript received 10/25/56 


PSYCHOMETRIKA—VOL, 22, NO. 2 
JUNE, 1957 


ON THE APPLICATIONS OF 
THE METHOD OF ABSOLUTE SCALING* 


CHUNG-TEH Fax 
EDUCATIONAL TESTING SERVICE 


Empirical and fictitious examples are described for investigating the 
applications of the absolute sealing method for item scaling and score scaling. 
A diserepaney between the correct values and the values estimated through 
the absolute scaling method is demonstrated. It is concluded that when the 
groups are different, the assumption of an identity between test score con- 
version and item difficulty conversion is not met. 


Before investigating the applications of the absolute sealing method 
for item sealing and score sealing a brief explanation of Thurstone's funda- 


mental equations [2] will be presented. 

Suppose ability scores are known for two groups of people, Groups 1 
and 2, and suppose the ability scores are normally distributed for both groups. 
The relationship between these groups can be described by an expression 


derived as follows: 
(1) (K — M)/n = ал, 
(2) (X — M/s 


σου 


where X is the ability score, M, , σι, and M, , σι are the ability score means 
and standard deviations, and z, and x, are sigma values of the ability scores 
for Group 1 and Group 2, respectively. Solving (2) for X and substituting in 
(1) gives 


(3) ж = (о/о) + (Ma — M .)/o) . 


Thurstone assumes that this relationship can also be expressed in terms 
of measures of item diffieulty. The diffieulty, p, of each item having been 
determined for Group 1 and Group 2 separately, each of these p-values can 
be converted to its corresponding normal deviate. The normal deviates 
(vf , x$) ean be plotted for Group 1 versus Group 2 and the relationship 


between these deviates can be expressed, 
(af — mj)/s = (xi — mj/s. , 


*Acknowledgment is due to Dr. Gulliksen, Dr. Lord, Dr. Swineford, and Dr. Tucker 


for their criticism and advice. 
175 


176 PSYCHOMETRIKA 


from which 
(4) xy = (δι/6ο)α2 + mi — (81/82)me , 


where m, , m; and s, , з» are the means and standard deviations of the normal 
deviate values for the two groups. 

Assuming that (3) and (4) are alternative expressions for the same 
relationship Thurstone concludes that the slopes are equal, or 


(5) 95/01 = 8ι/5. ; 
and that the intercepts are equal, or 
(6) (M. — Λ41)/σι σης ыш [ΟΥΡ . 


It should be pointed out here that the identity of (3) and (4) cannot be 
demonstrated mathematically. Although there is a simple relationship 
between item difficulty and test score mean, there is no corresponding 
relationship between item difficulty and test score standard deviation without 
also taking into account the correlations between item scores and test scores. 
These correlations are not involved in either (3) or (4). Therefore, (5) and 
(6) ean represent, at best, only approximations. In situations where the 
correlations can safely be ignored, Thurstone's assumptions can lead to a 
very convenient scaling method. 

When the groups are similar, this scaling procedure produces useful 
results. When they are different, its value is open to doubt. The remainder 
of this paper will describe examples, using both real and fictitious data, 
that should serve to indicate the extent of the discrepancies that may be 
expected when the groups differ in certain specified respects. Accepting, 
then, Thurstone’s assumptions and an assumption of an identity between 
the test scores and the underlying ability, (5) and (6) may be applied to item 
scaling and score scaling in the following ways: 

For item scaling. To estimate the mean and standard deviation of the 
item difficulties (hereafter “item difficulty” refers to normal deviate value 
only) for Group 2 when the same test is given to Groups 1 and 2 and item- 
analysis data are available for only Group 1, (5) and (6) can be combined 
to give 


(7) fi, = [т — (M, — Μ.)/σι]ίσι/σ)), 
(8) y= Βι(σι/σ»), 


where ñi, and $. are the estimated mean and standard deviation of the item 
difficulties for Group 2; m, and s, are the observed mean and standard 
deviation of the item difficulties for Group 1; M, , σι and M, , сз are test 
score means and standard deviations for the two groups. 

For score scaling. Το estimate the test score mean and standard deviation 
of test Form 1 for Group 2 (Group 1 takes only test Form 1 and Group 2 


κα... 


CHUNG-TEH ΕΑΝ 177 


takes only test Form 2) when a set of common items is contained in both test 
forms and when item-analysis data are available for both groups, (5) and 
(6) can be employed as follows: 


(9) Й, = afm, — (5/s)m;] + M, , 
(10) a = σι(δι/δε), 


where m, , s, and m, , s, are the observed means and standard deviation of 
item difficulties of the items common to the two forms, and 17, and c, are the 
estimated mean and standard deviation for Group 2 in terms of the same 
scale used for the Group 1 data [4]. 

An empirical check on the applications of (7) through (10) can be made 
if the same test is given to two groups and if item analyses are obtained 
separately for these groups. There are two practical difficulties, however, in 
making such an empirical check. The first is to locate results for the same 
test given to two groups which are known to be different. The second is to 
avoid the complicating factor of the drop-outs (the items not answered by 
all the examinees toward the end of the test). If the two groups are not very 
different, the discrepancy between the estimated values and the correct 
values will not be very great in any case. If the drop-outs of the two groups 
are at, different rates they would also affect the item difficulties and test 
Scores differently for the two groups. 

Recently, one form of the Selective Service College Qualification Test, 
which was item-analyzed separately for each of the four college classes, was 
found adequate for such an empirical check. The observed means and standard 
deviations of item difficulties and subtest scores (raw scores) can be sub- 
stituted in (5) through (10) for verifying the assumed relationships for any 
two of the classes. A diserepancy between the observed values and estimated 
values is found for any two of the classes, particularly for freshmen and 
seniors, the two extreme groups. In order to keep this paper short, only one 
representative example will be described for illustration. 

A subtest of the same form, designed to measure data interpretation, 
Items 61-90 (29 items; Item 73 was not scored), has the following statistics 
for freshmen and seniors, all of whom attempted every item (random sample 
of 500 papers for each group): 

Raw score mean and standard deviation: 


Freshmen М, = 20.0340, с; = 4.2555; 


Seniors M, = 22.5000, o, = 4.2631. 
Item difficulty mean and standard deviation (for item data see Table 1): 
Freshmen m, = —.5810, S, = .6333; 
= —.8745, s, = .5647. 


Seniors m, 


178 PSYCHOMETRIKA 


TABLE 1 


Item Statistics for a Set of Twenty-nine Items Administered 
to 500 Freshmen and 500 Seniors in College 


Proportion Proportion 
of Freshmen of Seniors 


Item Who Answer Who Answer Difficulty Difficulty 
Number Correctly Correctly for Freshmen for Seniors 

Pe Ps “g *s 

.862 «898 -1.09 -1.27 
.888 .926 -1.22 -1.45 
:830 «892 -0.95 -1.2h 
+794 .852 -0.82 -1.05 
«818 «885 -0.91 -1.19 
«508 zu -0.02 -0.19 
+910 эм, -1.354 -1.59 
+920 «956 -ldu -1.52 
«880 «928 -1.17 “1.46 
«The «840 -0.65 -0.99 
+760 -852 -0.71 -1.05 
-718 .836 -0.58 -0.98 
.522 «652 -0.06 -0.39 
«868 «918 -1.12 -1.39 
«8889 E -1.22 -1.57 
.608 -716 -0.27 -0.57 
-604 15% -0.26 -0.69 
.506 .610 -0.02 -0.28 
«656 «182 -0.35 -0.78 
+328 «518 +0.45 -0.05 
«906 «950 -1.32 -1.48 
+920 9h -1.41 -1.57 
«186 Όσο -0.79 -1.05 
.836 «918 -0.98 -1.17 
«690 -724 -0.50 -0.59 
«430 «516 40.18 -0.0l 
«550 «550 40. ll -0.08 
«558 +490 +0.36 +0.03 
.188 «586 30.89 40.29 


rc (—Á ا‎ ep ώμο. = 
Correlation between item difficulties for freshmen and seniors: 
Tj, = .0807. 


Let us see to what extent these observed data satisfy (3), (4), and (7) 
through (10). Equation (3) gives 


v; = 1.001, + .58. 
Equation (4) gives 
xy = 1.12; + 40. 
Neither the slopes nor the intereepts are equal, as stated by (5) and (6). 
Similarly, for estimating the mean and standard deviation of item 


difficulties for the senior group from the item-analysis data for the freshman 
group, and the score data, (7) gives 


fü, = [m, — (M, — M,)/o;\o,/o) = —1.16, 


CHUNG-TEH ΕΑΝ 179 


instead of the observed value, m, = —.87. Equation (8) gives 


5, = 80/0.) = .08, 


instead of the observed value, s, = .56. 
For estimating the test score mean and standard deviation for the 


senior group from the observed item diffieulty statisties, (9) gives 
ΠΠ, = o,[m, = (s/s)m,] + M, = 21.74, 


instead of the observed value, M, — 22.50. Equation (10) gives 


δ, = σ/(5//5.) = 4.77, 


instead of the observed value, с, = 4.26. 
The foregoing empirieal check on the estimated values and observed 


2.00 


1.0 


Sigma values for seniors 
0.0 


-1.0 


172.0 


2.0 -1.0 0.0 1.0 
Sigma values for freshmen 


Figure 1 


Application of Equations (3) and (¥) to real data 


sents (3), 
difficulty 
o be 


he dotted line repre 
based on item- 
ems too great t 


values is also shown in igure 1. In Figure 1 t 
based on test score data. The solid line represents (3), 
data. (Each dot represents an item.) The discrepancy ве! 
ignored. 


shich this dis- 
Observed data are always subject to sam to wh 


pling error, 


180 PSYCHOMETRIKA 


crepancy might conceivably be ascribed. Let us, therefore, set up a fictitious 
example in order to avoid the possible effects of such errors. 

In order to describe a fictitious example the following formulas are 
needed to express the standard deviation of test scores in terms of item 
statistics: 


(11) М = >n, ] 
(12) στ n, 


where M is the raw-score mean, p; is the proportion of correct responses on 
item j (p; = R;/N), с is the raw-score standard deviation, ^; is the biserial 
correlation for item 7, and z; is the ordinate of the unit normal curve at the 
point 2; , the item difficulty for item j. 

Equation (11) is obvious. A quick derivation of (12) follows one equation 
which has been proved by Gulliksen [1]. In his Chapter 21, equation (20) is 


(13) с= Xn, 


where rz; is the point biserial correlation between item score and test score, 
and s; is the item standard deviation, which can be written vV p;(1 — р). 

The point biserial correlation can be expressed in terms of the biserial 
correlation: 


(14) Tz; = 7;(z;/s;). 


Substituting (14) in (13) gives (12). It should be noted that no approximations 
were used in the derivations of (11) and (12). Therefore when 7; and z; are 
available for all the items in the test, they can be used to reproduce the 
mean and standard deviation of the raw test scores exactly. 

Suppose that a test given to two groups is composed of 100 items with 
the following statistics. 

There is one set of 50 equivalent items, as follows: 


pı = 30 (z, = .50, z, = .35), ту = .30, for Group 1, 
р» = .50 (z, = .00, z, = .40), Ta = .30, for Group 2; 
and another set of 50 equivalent items, as follows: 
pı = 69 (αι = —.50, z, = .35), r, = .30, for Group 1, 
p. = 84 (z, = —1.00, z2 = .24), т» = .30, for Group 2. 


Although these fictitious data are unrealistic in a sense, none of the 
numerical values appears to be unreasonable. The means and standard 
deviations of the test scores can be computed from (11) and (12): 


CHUNG-TEH ΕΑΝ 181 


M, = 50, M, Ξ 67, 
σι — 10.5, сә = 9.6. 
Тһе means апа standard deviations of the item difficulties (z) сап be 
computed directly: 
m, = .00, m; = —.50, 


sı = .50, 8: = .50. 


Now let us apply these fictitious data to (3), (4) and (7) through (10): 


(3)givesz, = .91 aa + 1.62. 

(4) gives xí = xÇ + .50. 

(7) gives î, = 1.77, instead of m, = —.50. 
(8) gives. = .55, instead of ѕ = .50. 


(9) gives JÎ, = 55.25, instead of M. = 67. 
(10) gives z, = 10.5, instead of ¢ = 9.6. 

The discrepancy between the dotted line based on scores and the solid 
line based on item difficulties is shown graphically in Figure 2, which shows 
that the conversion line based on item difficulties and the conversion line 
based on test scores are not the same line. 


Sigma values for Group 2 


Sigua values for Group 1 


Figure 2 
Application of Equations (3) and (4) to fictitious data 


182 PSYCHOMETRIKA 


This fictitious example has eliminated the problem of unreliability of 
the data and also avoided the problem of assumptions of normality. The 
finding, consequently, leads us to question the relationship of slopes and 
intercepts of test score conversion and item difficulty conversion explicitly 
stated in (5) and (6), which are the direct consequences of the assumption 
of an identity of test score conversion and item difficulty conversion, (3) 
and (4). Let us investigate further (5) and (6). 

Suppose a test is given to two groups, and either of the following two 
simple cases occurs: 

A. The means are equal but the standard deviations are different. 
B. The means are different but the standard deviations are equal. 
An equating method which is appropriate for the more general case should 

certainly work for these two simple cases. 

In Case A it is assumed that, M, = М», σι ¥ o, , and m, = m, , and 
δι # s, . But when M, = M, , m, = m, , (6), 


(M, — M,)/o, = m, — (s/s)m, , 
gives no alternative but 
81/8 = 1 or s = s. 
And thus (5), 
δι/ϑα = σι/σι ; 
also gives no alternative but 
σε/σι = 1 or o, = σι. 


The results, therefore, contradict our assumptions. 
In Case B it is assumed that M, # M, , o, = c, , and m, ÉM, S 
It is obvious that (5) is consistent with our assumptions. But (6) 


(M, — M )/ə = т — ms. 


= $. 


gives 


This result implies that when σι = σε, s, = s, , the intercept of the conversion 
line determined by test score means and standard deviations varies with 
the test score standard deviations, whereas the intercept of the conversion 
line determined by item difficulty statistics is independent of the standard 
deviations of item difficulties. This is evidently not true. It is conceivable 
that when σι and е, (in this case σι = σι) are small and the difference between 
M, and M; is large the conversion line of test scores can be very different 
from the conversion line of item difficulties, as has been shown by the fictitious 
example. 

The foregoing investigation seems sufficient for us to draw the con- 
clusion that (5) and (6) can be approximately accurate only when the two 


CHUNG-TEH ΕΑΝ 183 


groups are similar but larger error would result when the two groups differ 
substantially. How this kind of error can be systematically corrected is not 
known to the writer. It should be noted, however, that it is not related to 
sampling errors in the observed data [3]. Since we are dealing with fictitious 
data we may assume no error of measurement in the statisties of our fictitious 
example. Sinee we know that (5) and (6) are the direct consequences of the 
assumption of an identity of (3) and (4), then we also know that when the 
groups are different the fundamental assumption of the identity between 
test score conversion and item difficulty conversion, in such simple relations, 
is not met. 


REFERENCES 
[1] Gulliksen, H. Theory of mental tests. New York: Wiley, 1950. 
[2] Thurstone, L. L. A method for scaling educational and psychological tests. J. educ. 


Psychol., 1925, 16, 433-451. | 
[3] Thurstone, L. L. Scale construction with weighted observations. J. educ. Psychol., 


1928, 19, 441-453. . - 
[4] Thurstone, L. L. The calibration of test items. Amer. Psychologist, 1947, 2, 103-104. 


Original manuscript received 7/28/54 
Revised manuscript received 6/25/56 


PSYCHOMETRIKA—VOL. 22, NO. 2 
JUNE, 1957 


A METHOD OF SCORE CONVERSION THROUGH 
ITEM STATISTICS* 


FRANCES SWINEFORD 
AND 
Снохв-Тен FAN 
EDUCATIONAL TESTING SERVICE 


A method is presented for converting the scores on one form of a 
test to those on another form of the same test. The method is particularly 
applicable to the case where each form has been administered to a different 
group and the only link between the two forms is a subset of items common 
to both. The proposed method, called the ¿tem method of conversion, has 
been applied to several tests for which other methods of conversion are 
available for comparison. The necessary data are limited to tests for which 
the total score is the criterion for item analyses. The method gives highly 
satisfactory results for all the tests to which it has been applied, particularly 
when the two groups are rather different, in which case the delta method 


(a different item method) is inappropriate. 


One of the problems arising in the construction of two (or more) forms 
of the same test is that of converting the scores to a common scale in order 
that the tests can be used interchangeably. In order to effect such a con- 
version, it is necessary to know, or to be able to estimate, the score statistics 
-on both forms for the same group of examinees. 

'The most obvious procedure is to administer both forms to an experi- 
пец tal population. This procedure is sometimes followed, with special pro- 
visions made for taking into account possible practice effect. Procedures 
which involve an estimation of score statistics on one form from observed 
data for the other form, however, have a far wider range of applicability in 
practical testing work. 

There are in current use at Educational Testing Service several methods 
for estimating the means and standard deviations on two forms of a test for 
the same group. Two of these methods, to which reference will be made 
later, are the part-score method of conversion and the delta method of conversion. 
Fach has advantages and disadvantages, which cannot be fully presented 
here, The purpose of this paper is to present a new conversion procedure 
which gives promise of a high degree of accuracy. Comparisons have been 
made with the methods named above. The new method will be called the 


item method of conversion. 


*The authors are only 
M. K. Schultz, all of whom ha 


two of a group, including W. H. Апроћ, F. M. Lord, and 
νο made important contributions to this paper. 


185 


186 PSYCHOMETRIKA 


Development of the Method 


The description of the method will start with the relationship between 
the test score distribution and item statistics for a single group taking a 
single test form. Let 


Di = R; Ns 
where p; is the proportion of correct responses for item j, 


R; is the number of correct responses for item j, 
N is the number of cases in the test score distribution, 


and let m; — (M; — M)/e, 
where m; is the mean standardized test score of those answering item 7 
correctly, 
M; is mean raw test score of those answering item j correctly, and 
M, σ are the mean and standard deviation of the rights scores on the 
test. 
For a test of n items the raw-score mean and standard deviation can be 
expressed in terms of the item statistics as follows: 


(1) M = Ур, 
i=1 

(2) ome αμ, 
j=l 


Equation (1) is obvious and familiar. Equation (2) may need explanation. 
A quick derivation follows from two equations which have been given by 
Gulliksen [2]. In his Chapter 21, equation (20) is 


(3) a= Mrs, 
ici 

and (32) is 

(4) r8; = p (M, — M)/c, 


where, in present notation, 

Tiz is the point biserial item-test correlation, 

s; is the item standard deviation, which can be written V1 = pj. 
Equation (2) can be obtained by substituting (4) in (3). 

Τ should be noted that no assumptions or approximations were used in 
the derivation of (1) and (2). When the appropriate item statistics are 
available they can be used to reproduce the mean and standard deviation of 
the test scores exactly. 

Thus far the relationship between the test score statistics (M and о) 
and item statisties (p; and m;) has been discussed only for a single group 


FRANCES SWINEFORD AND CHUNG-TEH FAN 187 


taking a single test form. When the same test is glven to two groups, two 
sets of item statistics are available: рд and m; , which can reproduce M, 
and σι for Group 1, and p;; and m;, , which сап reproduce M, and оз for 
Group 2. 

The relationship between corresponding values of p; and ру is non- 
linear, but a virtually linear relation may be expected between aj, and ть, 
where x, is the normal deviate above which р; of the area under the normal 
curve lies. The equation relating the two sets of normal deviates is 


(3) (zs — Μ.)/σ., = (аһ — M/s. , 


where M, and e, are the mean and standard deviation, respectively, of x; . 
It should be noted that the linear relation (5) will not hold if the test contains 
two or more kinds of material sueh that the two groups differ more with 
respect to one kind than with respect to the others, because of differences in 
Sex or in special training or experience, for example. 

A similar linear equation may be written as a close approximation for 
relating m;, and ту: 


(6) (mj, — M,o,, = (mi = Ma)/ my 5 


where J/,, and т„ are the mean and standard deviation of m; . 

Now let this test, for which (5) and (6) have been determined, be a 
set, of common items in the two forms, Form Y and Form Z, taken by Group 
1 and Group 2, respectively, and assume that the equations established 
through the common items ean be applied to the rest of the items for de- 
termining and mj. values from x; and m; values. The estimated mean 
and standard deviation on Form Y for Group 2 (for which the Form Z data 
are known) can be computed by formulas (1) and (2) from the estimated 
values of m; and рь (transformed from πω). The conversion equation re- 
lating the two forms can then be written from the observed mean and standard 
deviation on Form Z and the estimated mean and standard deviation on Form 
Y, all computed for the same group. The conversion equation is simply the 
formula derived by setting corresponding standard scores equal, and is written 


Y = (0/9) — (z,/z)M, + M, . 


It should be noted that this method of conversion is based on a minimum 
number of assumptions. The principal assumptions used are: 


1) There is a linear relation between z;, and Vj. 

2) There is a linear relation between m, and m; . 

3) The common items have been selected to represent the remaining 
items, and therefore the linear relation of x; and m; between the two groups 
established through common items can be applied to all the items. I 

'The three foregoing assumptions can generally be fulfilled in practice. 


188 PSYCHOMETRIKA 


The first two assumptions can be checked by their plots and their correlations. 
If the plots show a very high, linear relation, these assumptions have been 
met. The third assumption requires specific attention by the test constructor, 
but this assumption cannot be avoided by any method of conversion which 
uses common items as a link. 

'The item method of conversion has been compared with several other 
conversion procedures using real data. The test material in each case is 
homogeneous in nature, so that (5) describes the data with a high degree of 
accuracy. The most stringent check on the method consisted of equating a 
test to itself—a “conversion” which would never be required in practice but 
which serves admirably to identify weaknesses in conversion procedures. 
The 50-item test had been administered to two significantly different groups. 
The delta method of conversion, identical in principle with the Thurstone 
method of absolute scaling described by Fan [1], proved unsatisfactory, 
whether twenty or thirty items were treated as common to “both” forms, 
thus supporting the argument advanced by Fan. The part-score method 
was acceptable, whether based on twenty or on thirty items. The item 
method, also tried with twenty and with thirty items, reproduced the actual 
scores with even less error than did the part-score method, although additional 
evidence will be required to establish the superiority of either of these two 
procedures over the other. 

The results from these and other comparisons are promising. (A more 
complete description of the numerical examples can be obtained from the 
authors.) The item method can probably be used successfully with any test 
for which the total score is appropriately used as the criterion for item 
analysis. Applications to tests where part-score criteria are used have not 
yet been attempted. 


REFERENCES 
[1] Fan, C. T. On the applications of the method of absolute sealing. Psychometrika, 1957, 
22, 175-183. 
[2] Gulliksen, H. Theory of mental tests. New York: Wiley, 1950. 
Original manuscript received 7/28/54 


Revised manuscript received 6/25/56 


D 4 
س ےم‎ a 


PSYCHOMETRIKA—VOL, 22, No. 2 
JUNE, 1957 


Α REVISED LAW OF COMPARATIVE JUDGMENT* 


WILLIAM P. Harris 


LINCOLN LABORATORY 
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 


In contrast to Thurstone's Law of Comparative Judgment, a model 
in which a comparison pair and its complement are assumed to give rise to 
two different distributions of differences is considered. The consequences 
of this revised model on scaling problems is developed. 


"This paper presents a more general model of paired comparison scaling 
that follows from, but is less restrictive than, Thurstone's [7] original state- 


ment of the Law of Comparative Judgment. 

Consider the Law of Comparative Judgment, Case V, as usually written: 
а) δι = S; = Xia 
Where Х,; is the normal deviate corresponding to the proportion of occasions 
in which stimulus ç is judged greater than j, and S; and S; are the scale 
values or apparent magnitudes of the stimuli. The model was designed specifi- 
cally to provide for the scaling of a single set of stimuli of which ¿and j are two 
members of the set. It is assumed that when ¢ is the same as j the effects of 
the stimulus on the judgment behavior, i.e., the scale values S; and 8; , are 
equal. Let this be called the identity assumption. In a striet theoretical 
Sense, not without important consequences, ї and j can never be alike: by 
definition each stimulus in a pair must maintain separate identity for the 
subject, and therefore the stimuli must be in some sense physically different. 
Typically, they are separated spatially or temporally: one is on the right 
and the other on the left, or one is presented first and the other second. 
Since position in the pair may have an effect on the apparent magnitude of 
the stimulus, it cannot be assumed a priori that the values S; and 5; are 
necessarily equal. Time error and position preference are well-known 
phenomena that contradict this identity assumption. 

The identity assumption is implicit in the usual computational pro- 
cedures for paired comparison data, specified originally by Thurstone [7]. 
A kind of symmetry is forced on the matrix of proportions by an averaging 
procedure such that P;.;, the proportion of judgments ¿ greater than j, 
equals 1 — P;>; . The result is a somewhat arbitrary cancelling out of апу 
effect of bias due to position in the comparison pair. Also, no provision 1s 

ly by the Army, Navy, and Air 


"The research in this article was supported joint 
А logy- 
Force under contract with the Massachusetts Institute of Technology: 


189 


190 PSYCHOMETRIKA 


made for using entries in the diagonal of the matrix if these proportions 
differ from .50. 

Briefly, this paper presents a model in which a comparison pair and its 
complement are assumed to give rise to two different distributions of differ- 
ences on the psychological continuum; each distribution of differences over 
trials is assumed to be normal, but the distributions may have different 
means (and, ultimately, even different variances.) If this assumption is 
ever correct, then the original model is technically inaccurate in that the 
average of two proportions each from a different normal distribution is not 
itself a proportion in a normal distribution. However, since in practice the 
assumption of a normal distribution is not crucial, as Mosteller [5] says, “is 
more in the nature of a computational device than anything else," it would 
be a mistake to assume that the usual computational procedures are seriously 
in error in this sense. In another way, averaging a matrix that exhibits bias 
can have a more serious consequence: as Mosteller [5] points out in con- 
nection with his test of goodness of fit, the chi square values tend to be 
spuriously low for such data. Thus, in addition to providing an explicit means 
of evaluating bias, the proposed method restores the validity of the statistical 
test in the event of such an effect. 

Since theoretically the identity assumption is never correct, and since 
it is inaccurate in certain applications, there is every advantage in discarding 
this restriction on the model. It is demonstrated below that a simple extension 
of the usual procedures leads to a meaningful paired comparison scale without 
the restrictive assumption. It is shown that the revised model has an im- 
portant consequence for scaling in the area of sensory psychophysics. The 
use of a more efficient experimental design of the paired comparison experiment 
is suggested for attitude measurement. Finally, it is shown that Thurstone’s 
model for successive intervals data follows more directly from the more 
general model of paired comparisons proposed here. 


T'he Revised Model 


To emphasize that location of the stimulus may affect its apparent value, 
let. Case V of the Law of Comparative Judgment be written 


(2) A; — B, = X 


A may represent the apparent magnitudes of the stimuli on the right and B 
on the left. Or A may be the first in order and B the second. To obtain a set 
of scale values, the P;>; are used as obtained, and no attempt is made to 
average P;>; with 1 — P;., across the diagonal. In some cases diagonal 
entries are not obtained because the comparison of like stimuli is absurd— 
as is usually the case for application of the model in attitude measurement. 
These cells may be treated as unfilled cells in the matrix. No entry need be 
assumed since a procedure will be discussed that permits scaling of incomplete 


Hc 


WILLIAM P. HARRIS 191 


matrices. All empirical values of P;.; are converted to X;; by the normal 
integral transformation in the usual manner. 
The least square solution of (2) for a complete matrix of data is given by 


(3) Aes (в /» )+ (E x, fn) 


and 
(4) В; = (3 a, /m) = (Sx. /m). 

i=l i=l 
This follows as a simple extension of the solution for symmetric matrices 
demonstrated by Mosteller [4]. Since the origin of the scale is indeterminate, 
it is convenient to set the average value of, say, B; equal to zero. Substituting 
zero into (3) gives 


(3a) A= (= xu fn). 


j=l 

The values of A, obtained from (3a) can be substituted in (4) to obtain B; . 
As a check, the sum of B, should be zero. А 

For ап ineomplete matrix, а least square solution of (2) can be obtained 
by the principle proposed by Gulliksen [3]. Find an approximate set of values 
of A, (or B,) by any convenient method. For example, values can be obtained 
by Thurstone's procedure of finding the mean difference between рамя of 
values in adjacent rows or columns as an estimate of scale separation. Given, 
say, values of A, , find values of В; from (4). The sums apply only to filled 
cells, and the value of m must be adjusted for each row or column. For 
example, in the jth column of a matrix there may be values of X»; , Хз; and 
Xi; . The other entries are missing, say, because the obtained proportions 
were too small or too large to give stable estimates. The first term in (4) is 
given by the sum (4, + A, + A,)/m, where m = ὃ, the number of filled 
cells in the column. The second term m the mean of the filled cells. With 
values of B; obtained in this way, a new set of estimates of A; is obtained from 
(3) in an analogous manner—that is, by summing over the filled cells in the 
ith row and adjusting the value of A according to the entries in the row: 
Iterate until the sum of squared errors reaches a stable state for the desired 
decimal accuracy of scale values. 

The chi square test of goodness of fit of the model proposed by Mosteller 
[5] is readily adapted to the scaling of asymmetric matrices. Use of this test 
is illustrated below. i 


Application to Some Data on Visual Judgments of Line Lengths 


In order to demonstrate the application of (2), paired comparison data 
were gathered on line-length judgments with the horizontal-vertical illusion 


192 PSYCHOMETRIKA 


as a source of bias. Two thin, black lines were presented as the comparison 
pair in the form of a cross, centered on white cards ten inches on a side. The 
lines varied in seven equal steps of one-sixteenth of an inch from 3 13/16 to 
4 3/16 inches. The viewing distance was about 30 inches. All of the forty-nine 
possible pairs were presented five times to each of ten subjects—except that 
the seven pairs of equal length appeared ten times per subject. 

The matrix of proportions pooled over subjects is shown in Table 1. 


TABLE 1 


Proportion of Times the Vertical Line Was Judged Greater 
Than the Horizontal and Scale Values* 


HORIZONTAL | Scale 

i 2 з 4 Е 6 т | Values 

*| XE φᾷ % ME - - m 

[2] 86 29 48 .36 .2 о - 1.43 

© 3 - 92 80 54 „0 48  .08 1.66 

E14] = 94 ла .82 .68 .26 08 2.00 

B 5] = 34 94 .86 75 0 8 2.49 

6j - = 94 86 94 ла  .68 3.03 

|= - - ت‎ = 9 19 3.82 
ΔΟΕ | e ss das чай Gus as 2.97 

= 


*Proportions less than .05 and greater than .95 omitted, 


Inspection of the major diagonal shows that the vertical lines were consistently 
judged longer when the two lines were equal in length. Likewise the sums of 
complementary pairs of off-diagonal entries are consistently greater than 
unity. The vertical and horizontal lines were scaled separately by the pro- 
cedure for an incomplete matrix outlined above. The iterative procedure 
began with estimates of B, obtained by the method of differences between 
adjacent columns. The error variance in estimating the empirical values of 
X;; was computed for each of four complete iterations, i.e., for each new set 
of values of both A; and B; . The percentages of error variance were 8.9339, 
8.4221, 8.4169 and 8.4154; thus the scale values of the fourth iteration were 
judged to be adequately precise. 

The scale values are plotted against each other in Figure 1. Each point 
represents the scale values of a pair of lines of equal length. The straight line 
fitted to the data passes through the point representing the mean scale values 
and has unit slope. To the extent that the fit is satisfactory, the result shows 
that the scale values are linearly related and that the bias due to the illusion 
is a constant regardless of scale value. Thus, it seems legitimate in this case 
to find the average scale value for line length regardless of position. Such a 
scale obviously would not differ appreciably from that obtained had values 


WILLIAM P. HARRIS 193 


27 
© 
5 
Ë 
o, 
= 
= | 
rs 
3 
w hous 
z 
EI 
3 ŠP 
< 
9 
= 
α 
ω 
E 
o = SLOPE:1.00 
т 
Ë 
о 
= 
u 
a 
1 
is 
= 
ul 
a 
& 
a 
< ў | 
o ] 2 3 4 5 


APPARENT LENGTH OF HORIZONTAL LINE (arbitrory units ) 


FIGURE 1 


Plot of scale values of vertical versus horizontal lines of equal le 
unit slope and passes through the mean. 


ngth. The fitted line is of 


of Pis; and 1 — Р, ; been averaged across the diagonal. However, it is clear 
that the new procedure has a sounder basis because it permits explicit ex- 
amination of the interaction of bias with scale value. It also provides a basis 
for using the biased entries in the main diagonal, a procedure in contrast 
to the traditional one. 

Besides the average scale values for length, a single number represents 
the average bias, the effect of illusion, as shown in Figure 1. It is in the same 
psychological units as the scale of apparent length. This result may be con- 
trasted to the measure of illusion effect obtained by the Method of Limits. 
By this method the Point of Subjective Equality would be obtained, and the 
amount of bias would be expressed in physical units. It is clear that in general 
the interaction between the bias and main variables will be different. if 
measured in equivalent physical units rather than in psychological шше 
Whenever the physical and psychological seales are non-linearly Telaten 
This problem is not encountered in these data because, as shown in F — 
there is little cause to reject the hypothesis that actual and apparent lengths 
are linearly related in the narrow region of the experiment. 


194 PSYCHOMETRIKA 


g 
= 
s 
- 
> 
E 
= ΄ 
Ξ “ο 
= VERTICAL 2 
= 
5 AVERAGE ΄ 
ὦ HORIZONTAL 
- 
2 
ш 
ac 
< 
a 
a 
< 
ш 
о 
4 
α 
u 
z 
Z 
ај ےھ ا‎ 
3.8 4.0 4.2 


LINE LENGTH (inches) 


FIGURE 2 
Average apparent length versus actual length of line. Solid line fitted to data points by 
inspection. The dashed lines represent the scales expected when the illusion effect is 
accounted for. 


The model for a constant additive bias may be written explieitly 


(5) 8, — 8, + b = X,,, 
where S; and S; have the same numerical value when ¢ = j and b is the bias 
effect. Although a least square solution to (5) may be obtained, it is somewhat 
tedious; it is likely that average values of correponding values of A; and B; 
will serve in most practical applications. For the illusion data, initial values 
of S and b, estimated from weighted averages of A, and B; , were only 
trivially different from least square parameters. 

For illustrative purposes, two chi square tests of goodness of fit were 
carried out, one for separate and one for combined scale values. It will be 
remembered that Mosteller's [5] formula is 


> t — ety 
X = (DN ^' 


WILLIAM P. HARRIS 195 


where 0; and 0/;, for empirical and computed proportions, respectively, 


are given by 
0;; = arcsin VP;,;. 


N is the number of judgments per stimulus. Since the N was 100 for diagonal 
entries and 50 for off-diagonal entries, the sums of chi squares were computed 
separately for the two sections of the matrix and added together. The number 
of degrees of freedom for separate scales for vertieal and horizontal is 23— 
36 empirical values are estimated from 13 scale differences, i.e., 14 scale 
values minus one parameter for the arbitrary origin. For the constant bias 
model there are seven parameters, six scale differences plus one constant of 
bias, and therefore 29 (—36 — 7) degrees of freedom. Both tests were highly 
significant (p « -001)—probably due to heterogeneity of the results over 
individuals. 

The analysis of these data indicates that for scaling attributes of sensation 
the method has more Significance than merely handling an annoying and 
trivial kind of bias. Typically the psychological magnitude of a stimulus is 
related principally to one physical property of the stimulus. For example, 
loudness is chiefly affected by the intensity of a tone. However, the apparent 
magnitude may also be affected to a small but significant extent by other 
physical parameters, e.g., loudness is also a function of frequency. It is obvious 
that the variable that permits identification of each member of a pair of 
stimuli, i.e., spatial location or temporal order, may itself have an effect, of 
importance. This more general interpretation of Thurstone's principle permits 
direct measurement of the effect of such a secondary parameter on the same 
psychological scale as that for the primary variable. 


Implications for Attitude Measurement 


It is apparent that for scaling stimulus objects that cannot be ordered 
along a physical continuum, in the study of attitudes, aesthetic preference, 
and the like, the scaling of bias is of secondary importance. Although the 
procedure outlined above should be carried out routinely on any paired 
comparison data, it is unlikely that position in the pair will often be a signifi- 
cant variable. Clearly, if the two scales are related by a line of unit slope, 
an average scale is meaningful. This is true even if there is a constant additive 
bias. There is, however, an important practical consequence of a general 
fact that has not yet been made apparent. Equation (2) and the procedures 
that follow apply even if the two sets of stimuli have no members in common 
whatever. That is, for example, "apple" or “orange” or “pear” mu а : 

the left, each paired with “banana” or "grapes" or “tangerine” on the 
xi f h rocedure for solving (2) is carried through, the value of each 
uas ο ша can be determined on a single scale of preference. Indeed, 


the number of stimuli in each location need not even be the same. Accordingly, 
the 


196 PSYCHOMETRIKA 


(3) and (4) were written for different numbers of stimuli, m and n. If there is a 
satisfactory fit of this unidimensional model of stimulus effect, then the 
conclusion that the scale measures a single attribute of the stimulus is 
warranted—whatever the sources of variance. Of course, in this extreme 
case the use of mutually exclusive classes of stimuli in each location permits 
no evaluation of the effects of location apart from the principle effect of the 
stimulus. It may be satisfactory, however, to include only a few probe 
stimuli, common to both locations in the pair and scattered over the range of 
all stimuli on the continuum. The results for the common stimuli would 
indicate the extent of bias. Such a procedure, taken in conjunction with the 
fact that least square solutions for incomplete matrices are possible, leads 
to the practical consideration that the number of stimuli can be greatly 
enlarged at little expense by deliberately omitting many paired comparisons. 


Implications for the Method of Successive Intervals 


Recently there has been a revival of interest in a paired comparison 
model of successive intervals data that Saffir [6] attributes as due to Thurstone. 
Briefly, the scaling procedure follows from the assumption that the boundaries 
between the successive intervals form a hypothetical, ordered set of stimulus 
effects on the psychological continuum. Each judgment of a stimulus £ 
represents, in effect, that it lies above some interval boundary g and below 
g + 1. From the (reverse) cumulated frequency of judgments of a stimulus 
over the set of categories, the proportion of times i lies above g may be 
obtained. This is interpreted according to Thurstone’s principle by converting 
the proportion to a distance between stimulus and category boundary. For 
example, Case V is written 


(8) ἣν = T, = δι, 


in which 5; is the apparent magnitude of the stimulus, T', is the scale value 
of a boundary between intervals, and X,, is the unit normal deviate corre- 
sponding to the proportion of times stimulus ¢ is judged greater than boundary 
g. The application of this model is discussed briefly by Green [2] and will be 
dealt with extensively in a forthcoming monograph by W. S. Torgerson. 

It is clear that the members of the stimulus pair in (8) are very different 
indeed. S; is an effect due to a stimulus object presented to the subject. 
However, 7, is in the extreme an entirely hypothetical construct, such as in 
the case of absolute, numerical judgments of the magnitude of single stimuli. 
In that case the boundary is dealt with as a stimulus effect expected on the 
basis of instructions to the subject and confirmed by the fit of the model. 
It is clear that application of (8) to successive intervals data involves two 
assumptions: first, the Law of Comparative Judgment is meaningful without 
the restriction to stimulus pairs drawn from a common set, and, second, the 
boundaries between successive intervals may be interpreted as stimuli. The 


M — nF... 


WILLIAM P. HARRIS 197 


first assumption is that made in the interpretation of (2) for paired stimulus 
objects. Thus it is clear that the successive intervals model follows directly 
from the new interpretation of Thurstone's principle and not from the model 
implied by the usual computational procedures. Of course, both assumptions 
underlying the successive intervals model suggest difficulties of interpreting 
the significance of the scale values obtained. These problems are dealt, with 
elsewhere. However, it is worth commenting that Edwards [1] in a study of 
food preference found close agreement between scale values for stimuli by 
paired comparison and successive intervals methods; presumably his treatment 
of each set of data differs only in unimportant details from that recommended 
here. 

As far as computational procedures are concerned, the methods of 
determining scale values are the same for (8) as for (2). At this point, it 
should be noted that when empirical values of X are present only about 
the diagonal, often the case for successive intervals data, it is the writer’s 
experience that the iterative procedure may converge slowly to the least 
square scale values. One way to understand this circumstance is to realize 
that two distant stimuli are tied together solely by their distances from 
intermediate stimuli and not by any direct estimates. Another way to look at 
the matter is to note that the more holes in the matrix, the more weight is 
carried, in effect, by old estimates of scale values in estimating new values. 
Of course, the better the estimates of the proportions and the fit of the model, 
the fewer iterations will be required. 


The Complete Form of the Model 
The statement of the complete model is 


A, — B, = Xa Va T V Sra , 
= which the term under the radical allows for the variance associated with 
the difference between scale values. Different discriminal dispersions, a, 
and b, , are assigned to different, members of the stimulus pair in a manner 
analogous to the assumption for scale values. 

For Case V, it may be assumed that the correlation is nearly constant 
and, following Mosteller [4], not necessarily zero. It is also reasonable to let 
the variance associated with a stimulus in each position of the pair be constant. 
Then, application of (2) implies only that 


a° + b? — arab = 1. 
Specifically it is not necessary to assume that Case V is restricted by the 
assumption that the dispersion of stimuli in one position of the pair 1s 
necessarily equal to the dispersion of those in the other. As pointed out by 
Torgerson [8], this statement has important consequences for the use of the 
model to fit successive intervals data in which the members of the pair are 


198 PSYCHOMETRIKA 


so 


radically different. For example, in scaling sensory attributes it is often 


likely in practice that the variance associated with the stimulus is small 
relative to that of the category boundaries. 


It is not essential here to carry the discussion beyond considerations of 


Case V, which is adequate for many applications. Other assumptions may 


be 


made and their implications written into an equation as dictated by the 


conditions and the results of specific experiments. 


[2 


6 


7] 
8 


REFERENCES 


Edwards, A. L. The scaling of stimuli by the method of succe 
Psychol., 1952, 36, 118-122. 

Green, B. F. Attitude measurement. In G. Lindzey (Ed.), Handbook of social 
psychology. Cambridge 42, Mass.: Addison-Wesley Publishing Co., 1954. Pp. 335-369. 
Gulliksen, H. A least squares solution for paired comparisons with incomplete data. 
Psychometrika, 1956, 21, 125-134. 

Mosteller, F. Remarks on the method of paired comparisons: I. The least squares 
solution assuming equal standard deviations and equal correlations. Psychometrika, 
1951, 16, 3-9. 

Mosteller, F. Remarks on the method of paired comparisons. II. A test of significance 
for paired comparisions when equal standard deviations and equal correlations are 
assumed. Psychometrika, 1951, 16, 207-218. 

affir, M. A. A comparative study of scales constructed by three psychophysical 
methods. Psychometrika, 1937, 2, 179-198. 

Thurstone, L. L. Psychophysical analysis. Amer. J. Psychol., 1927, 38, 368-389. 
Torgerson, W. S. A law of categorical judgment. In L. S. Clark (Ed.), Consumer 
behavior. New York: New York Univ. Press, 1954. Pp. 92-94. 


ssive intervals. J. appl. 


Original manuscript received 8/14/56 


Revised manuscript received 1 1/15/56 


— M PÜ — IA ——, ә 


PSYCHOMETRIKA—VOL. 22, No. 2 
JUNE, 1957 


A FAST APPRONIMATE ALGEBRAIC FACTOR ROTATION 
METHOD TO MAXIMIZE AGREEMENT BETWEEN LOADINGS 
AND PREDETERMINED WEIGHTS 


Davip A. RODGERS 


UNIVERSITY OF CALIFORNIA 


on E Rd pouting а set of orthogonal axes into a reference frame 

weights as possible 19 as nearly proportional to a predetermined set of 

solution, often require presented. „The method, an approximate algebraic 

most of the λύρα involy additional graphical refinement but eliminates 

value is speed and NS ved in usual graphical solutions. Its primary 

cation and solution of T D calculation, involving only one matrix multipli- 
а simple formula to determine the rotation cosines. 


η Т epis '. of factor rotation presented here is computationally 
to rotate orthog Fe ul supplement to graphic procedures when it is desired 
{таё on мет apta reference axes, e.g., a centroid solution, into a reference 
thetical set of 2 oadings correspond maximally to a predetermined hypo- 
АЛЕП Pan У weights. The hypothetical weights might correspond to pre- 
Cine οὐ ple Structure loadings or to graded values along some variable 
ght conceivably be present in the factor space. It was originally 
used to determine whether there was a dimension in the factor space, de- 
termined by the intercorrelations of salesmen's self-concept Q-sorts, that 
would rank the salesmen in the same way that they were known to be ranked 
by their supervisor on the basis of selling ability. 
The problem is to maximize the correlation between a predetermined 
Set of weights and the desired factor loadings. Mosier [2] and Tucker [5] 
have offered exact solutions for this problem, applying them to the determi- 
nation of simple structure. When computational labor is not a problem, or 
when considerable investment has been made in determining the hypothetical 
weights to be matched, as in Eysenck's Criterion analysis" [1], Tucker’s or 
Mosier's solution is to be preferred. However, these solutions involve lengthy 
computational procedures, including caleulation of an inverse matrix, that 
may seem unwarranted if the weights to be matched are only rough ap- 
proximations to presumed actual factor structure—as in the example of 
ranking of salesmen, and/or if only one or two reference axes are sought 
instead of a complete reference frame. Жи, 
on сап be achieved by maximizing 


A less precise but much faster soluti. 
the sum of cross products of the arbitrary weights, corrected to a zero mean, 


199 


200 PSYCHOMETRIKA 


with the obtained factor loadings, thereby maximizing the xy cross product 
sum in the standard product moment correlation formula, 


a) r= P ay/No.o, 


in which z is the arbitrary set of weights and y the desired loadings. N and 
c, are fixed quantities determined by the weights chosen and the number 
of tests involved; e, is the only uncontrolled variable in such an approximation. 
The approximation normally yields loadings with greater dispersion than 
would be obtained by an exact solution. The results are usually close enough 
to the desired solution that only slight additional adjustment by graphical 
methods is necessary. 


Derivation of Formulas 


Let the cosines between orthogonal factors I, II, --- , N and the desired 
factor F be хт, y, · - - , u, respectively; let the arbitrary weights be W, ; Was, 
W; , so chosen that they sum to zero; let the desired sum of cross products 
between the factor loadings on F and the arbitrary weights be 7; let the sum 
of the cross products of the arbitrary weights with the corresponding loadings 
on Factor I be a; similarly let the sum of the weights with the factor loadings 


on the other orthogonal factors II, ..., N beb, ---, n, respectively. 
Then 

(2) T = az + by + +++ + nu, 

and 

(8) ο εν esas Бш 1а фут... 4 C NIS 

so that 

(4) T = ax + by + ++ + а — y? = ¿a e 

T will be maximum when 

Therefore, for 7 equal to a maximum, 

(6) m. و‎ — = yt ы а πὲ ο, 


and 


(7) ду ты 


DAVID A. ROGERS 201 


"Therefore 


: ds 3 @ ЕЁ sb ti _ 
6 wo m F g F ç pe ш 


But from (8) 


(8) 


"Therefore 
(9) K == a + b: + RS sp n. 
Since 
(10 е 
а/а? = К, 
then 
11a) | — λάδι 
(11а) z= + а/а + b° + ves d n); 
1 2 3 3 3 
E^ у= жува + 4n) 
and 
a u = xat Kar ++ +"). 
- Since a, b, +++, n can be determined, and since 2, y, +++, U are the 
IE rotation cosines, the desired factor can be determined algebraically 
"d Κο of the above formulas. The signs chosen for z, y, ۰° , U are the 
lis n mart of the values obtained for a, b, :'' , % respectively. 
orthogon: terms, if W is the matrix of arbitrary weights and Fo the 
Be te O eae then a, b, --- , n are the elements in the F column 


» L i i . . 
9 a ο reference axis) of the FW matrix. The rotation cosines 
i ed by formula (11) from these column entries. 


Sample Calculation 


An example may clarify the process further. Table 1 summarizes the 
factor loadings on four orthogonal centroid factors, removed from a matrix 
of intercorrelations of Q-sort descriptions of twelve salesmen. On the basis 
of company ratings, the salesmen were placed in pairs in six categories, from 
most competent to least competent, as shown in Table 1. It was desired to 
determine by algebraic rotation the one factor that would best discriminate 
the subjects according to their competence as salesmen. 

As shown in Table 1, arbitrary weights of +5, +3, +1, =, 
— 5 are assigned on the basis of the salesmen’s ratings, +5 represen 
highest rating and —5 the lowest one. The rotation cosines are compu 


as follows: 


—3, and 
ting the 
ted 


202 PSYCHOMETRIKA 


a = 5 X .69 + 5 X .64 + 3 X .72 + --- + (—5) X 68 = +1.28 
b = 5 X 43 + 5 X .39 + 3 X (—.00) + --- + (15) 

X (—.28) = +6.14. 
e = 5 X .09 + 5 X 21 + 3 X (—.17) +=- + (—5) 


X οἱ = —1.97. 
d —5X(—.05 + 5 X (FAH 4-3 RAS E am 1 (<8) 
X (—.15) = +2.39. 
Therefore 


(1.28)? 


а πρ το + (C197) + (2:39): = 0.083484, + = +0.183. 


Similarly, y = +0.878, z = — 0.282, and w = +0.342. 


TABLE 1 


Company Ratings and Factor Data for Twelve Salesmen 


Proficiency Arbitrary 


Person 


Rating Weight 


Ñ WwW ΕΣ + ي‎ хл σι σι 


m] 
o 
№ 
П 

w 


E 
i 


m 


DAVID A. ROGERS 203 


pd TABIE 2 


Maximized Cross-Product Rotation Applied to 
Thurstone's Centroid Matrix of Fictitious Test Battery 


Actual Simple Structure 


Cross-Product Rotation 


ο 0 -3 O ν FUN 


204 PSYCHOMETRIKA 


Using z, y, z, and w as the rotation cosines, the loadings on factor F 
are obtained in the usual manner. For example, person one’s loading on 
factor Ё = .183 X .69 +.878 X 43 + (—.282) X .09 +.342 x (—.04) = 
+.46. The computed loadings are shown in Table 1. The last column in 
Table 1 shows the results of Tucker's solution [5] for maximizing the corre- 
lation with the arbitrary weights. 

An example of the accuracy of the method is given in Table 2. To 
illustrate his method of extended vectors, Thurstone presented a fictitious 
simple configuration that was then rotated into a centroid orthogonal matrix 
([3], pp. 208-209, and [4], pp. 230-231). Rerotation to the original simple struc- 
ture therefore offers an exact test of rotational methods. Table 2 presents the 
results of the rotation method described here, in which the original simple 
structure loadings were used to determine the arbitrary weights. Although 
approximate, the solution clearly reveals the underlying simple structure 
when the data are plotted, making additional adjustment by graphical 
methods simple and straightforward. Figure 1 shows two sample plots. It 


A, С 
19 7 
12 " 
16 24 16 17 " 
I5 à : 6 
l 
3 5 
18 
12 10 
25 ч 
32. 8, D, 
2 19 
21 Β m 
9 6 14 4 2 
783 sli 2418 B δ; ος 
1020 23 22 РЯ 
FIGURE 1 


Maximized Cross-Product Factor Plots 


should perhaps be emphasized that the purpose of this illustration is to 
show the relative ability of the method to duplicate desired arbitrary di- 
mensions known to be present in the factor Space and that the use of a simple 
structure configuration is regarded as only incidental to this purpose. 


— Zb ааны و‎ 


> ——— . 
— —m ТЕ. ` 


DAVID A. ROGERS 205 


REFERENCES 


[1] Eysenck, H. J. Criterion analysis—an application of the hypothetico-deductive method 
to factor analysis. Psychol. Rev., 1950, 57, 38-53. 

[2] Mosier, C. I. Determining a simple structure when loadings for certain tests are known. 
Psychometrika, 1939, 4, 149-162. 

[3] Thurstone, L. L. A new rotational method in factor analysis. Psychometrika, 1938, 3, 
199-218. 

[4] Thurstone, L. L. Multiple-factor analysis. Chicago: Univ. Chicago Press, 1947. 

[5] Tucker, L. R. A semi-analytie method of factorial rotation to simple structure. Psy- 
chometrika, 1944, 9, 43-68. 


Original manuscript received 3/5/56 


Revised manuscript received 12/11/56 


$4 


PSYCHOMETRIKA—VOL. 22, ХО. 3 
SEPTEMBER, 1957 


A SIGNIFICANCE TEST FOR THE HYPOTHESIS 
THAT TWO VARIABLES MEASURE THE SAME TRAIT 
EXCEPT FOR ERRORS OF MEASUREMENT* 


Freperic М. LORD 
EDUCATIONAL TESTING SERVICE 


The likelihood-ratio significance test is derived for the hypothesis that 
after correction for attenuation two variables have a perfect correlation in 
the population from which the sample is drawn. 


It is frequently desired to determine whether or not two somewhat 
different tests are actually measuring the same thing. Is there really any 
difference, for example, between the abilities measured by а synonyms test 
and the abilities measured by an antonyms test? Or between the abilities 
measured by the “college” and the “high school” forms of a test of ‘quanti- 
tative” aptitude? If the same student takes both forms, his two scores may 
be obviously different from each other, partly because the two score scales 
have a different origin and a different unit of measurement, and partly 
because each score contains an error of measurement. The basic question 
to be asked is whether these tivo tests would have a correlation of 1.00 if 
all errors of measurement were eliminated. The purpose of the present paper 
is to derive and present the likelihood-ratio significance test for the hypothesis 
that such a correlation is 1.00. Although it would be desirable to take the 
effects of sampling test items into account, the derivation here will be con- 
cerned only with the sampling of examinees. 

The procedure for estimating what the value of a correlation coefficient 
would be if all errors of measurement were eliminated was developed by 
Spearman, who called it correcting for attenuation. The basic formula is 


Ρεν 


(4) Ps, = се изи , 


where pz, is the population value of the correlation between test X and test 
Y, pz, and pyy are the population values of the reliability coefficients for the 
two tests, and P., is the population value of the correlation between 2 and 
y corrected for attenuation, which for the sake of brevity will hereafter be 
called the disattenuated correlation. The assumptions underlying (1) are 
discussed in many texts (e.g., [3], chap. 9). 

Tf the errors of measurement in z are uncorrelated with y and the errors 

*The writer is indebted to Professor John W. Tukey for his valuable suggestions on 
an earlier draft. 

207 


208 PSYCHOMETRIKA 


of measurement in z are uneorrelated with z, as they are ordinarily assumed 
to be, then the true value of the disattenuated correlation coefficient as 
defined by (1) can never be greater than 1.00 or less than —1.00. In working 
with actual data, however, sampling fluctuations may cause the observed 
correlation (r.,) between test X and test Y to be larger than the true value 
(ρ.ν), or the observed reliability coefficients (r.. and r,,) to be unduly small. 
It therefore sometimes occurs, when sample values are substituted for popula- 
tion values in (1), that the resulting estimate of the disattenuated correlation 
is numerically larger than 1.00. This unfortunate result of sampling fluctua- 
tions has caused the whole notion of correction for attenuation to be regarded 
with mistrust by many workers. Actually, however, the correction for at- 
tenuation answers a real and important question, and there is no need for 
mistrusting the results if they are properly interpreted. 

The practical problem arises when a disattenuated correlation coefficient 
has been computed from actual data and found to be equal, say, to .90 or 
to .95. Is the obtained disattenuated correlation coefficient consistent with 
the hypothesis that test X and test Y are really measures of the same ability 
or trait, i.e., with the hypothesis that B1? 

The reliability coefficients to be 
obtained in various ways, as for ex 
methods. The present derivation Will be concerned 


ays of combining these four 
term for (1) may be readily 
wo alternate formulas, and a 
on. 

5 are available for the disattenuated 


correlation coefficient (e.g., [4], pp. 526-529). The standard error, however, 


cannot be used with any assurance to test the hypothesis that P,, = 1 


because it is not clear to what extent the distribution of the sample estimate 
of P., can be approximated by the normal curve. 


The problem concerns four random variables (scores on the four test 


forms), denoted LR and ys. These variables will be assumed to have 
a normal multivariate distribution with unspecified means and the covariance 


matrix [σιι], (à, j = 1, 2, 3, 4). The two t-variables, and likewise the two 
y-variables, are assumed to be “parallel,” ner 
(2) Сі = 022, Озз = 044, δια = аз = σι, = = 


In practieal work, the assumptions stated in (2) can all be tested simul- 
taneously by means of Votaw's test for compound symmetry [8] before any 
further calculations are carried out. 


"» 


FREDERIC M. LORD 209 


The null hypothesis, Ho , is that P., = 1; the alternative, Н, , is that 
P., < 1. Under H, , the maximum likelihood estimates, б; , of the unknown 
parameters will be given by (26), (27), (45), and (46). Under Hi , the maximum 
likelihood estimates, ¢;; , are given by (48) and by (10) through (14); the 
corresponding maximum likelihood estimate, Ῥω, of the disattenuated 
correlation is equal to the quantity P., given by (18) except that if Ῥω», 
then P., = 1. 

The likelihood ratio statistic xî for testing the null hypothesis is given 
by (56). The significance test is carried out by looking up x: in a normal 
curve table. The procedure is illustrated by the following numerical examples. 


Numerical Examples 


Table 1 summarizes significance tests of the null hypothesis for several 
sets of hypothetical data. These data were chosen with a view to simplifying 
the calculations, taking advantage of the fact that when the observed statis- 
ties for variables 1 and 2 are equal to the corresponding statistics for variables 
3 and 4, then considerations of symmetry lead to a simple solution for G12 


and 23, , as given by (47). Thus in Table 1, the observed variances are all 


assumed to be equal (s; = δου = $53 = зи), and the observed correlations 


(r;;) are assumed to be such that ri» = 734 "is 
real data do not ordinarily display such exact relationships, they frequently 
may approximate the data of Table 1. Under these conditions, the om 
likelihood estimates under H, of the population correlation coefficients are 
given by À; = т; from (47), the maximum likelihood estimates under 
Н, are given by ji» = бм = Ёз = ἃ (ris + 2713). 

Each row of the table is assumed to be based on N = 100 cases. 
'The first two values in each row of the table represent the observed data. The 
value in the third column is obtained by the special formula given in the 
preceding paragraph. The fourth column gives the estimate of the d 
attenuated correlation derived from (18). The last two columns giye the 
likelihood ratio statistic, computed from (56), and the probability that 
as large a value of this statistic would occur by chance under the null hypoth- 
esis. 

The first line of Table 1 shows that if test X and test Y each has a 
reliability of .90, then, for the data illustrated, a sample disattenuated 
correlation of .978 lies at the 24-per cent level under the null hypothesis. 
The third and fifth lines of the table show that when the test reliabilities 
are both .80, a sample disattenuated correlation of .978 lies at the 17-per cent 
significance level, and a sample disattenuated correlation of .95 lies at the 
23-per cent significance level. A comparison of these figures indicates, as 
Would be expected, that the lower the test reliability, the more the sample 
disattenuated correlation may be expected to differ from 1.00 solely as the 
result of sampling fluctuations. 


= qua = Tac Although 


210 PSYCHOMETRIKA 


TABIE 1 


Significance Test for the Hypothesis That the Disattenusted Correlation 
Equals 1.00 for Various Hypothetical Data (N= 100 , ву = 955 = 955 = Spy) 


fi- 

- Maximum likelihood | Estimated disat- Likelihood] Signi 

Observed correlations = maximum a a valice 

likelihood estimates under Hy estimates under Ho = E τα ος 

=r. 
Typ = 754 F13 = Tiy Е 

= бур = Ёз = T23 = Toy = P15 Bey = үтүгә, u a 
.90 :88 . 886667 .918 .025 
.90 .87 88 «967 .005 
.80 .182222 - 188148 .918 17 
«80 “TT 8 «9655 «06 
«80 +76 «7173333 “55 +025 
80 75 7166667 9315 1999 
.80 m -76 .925 .005 


The following sample covariance matrix, computed from the test scores 
of 649 examinees, will provide a numerical example of some practical interest: 


86.3979 57.7751 56.8651 58.8986 
D m 57.7751 86.2632 59.3177 59.6683 
56.8651 59.3177 97.2850 73.8201 


58.8986 59.6683 73.8201 97.8192 
The corresponding sample correlation matrix is 


1 0692 «0905. «0407 
Pie 6092 1 -6475 0496 А 
0205 .6475 1 -7567 
040; «6496. 7507 1 


The first two variables in these matrices are parallel forms of a 15-item 
vocabulary test administered under such liberal time limits that approxi- 
mately 97 per cent of the examinees completed each form; the last two 
variables are parallel forms of a 75-item vocabulary test constructed 80 88 
to be as parallel as possible to the 15-item forms except that the adminis- 

ion time allowed was so short that only about two per cent of the ex- 
mee completed each form. À more detailed description of the data is 
RT here is whether or not the speeded and unspeeded 


Ἢ — 


) 


FREDERIC M. LORD 211 


vocabulary tests actually measure the same ability. Assuming only that the 
first two variables are parallel measures and the last two variables likewise, 
the maximum likelihood estimates of the true varianees and covariances 
are readily found from (10) through (14) to be 


86.3305 57.7751 58.6874 58.6874 
57.7751 86.3305 58.6874 58.6874 ; 
58.6874 58.6874 97.5521 73.8201 
58.6874 58.6874 73.8201 97.5521 


The best estimates of the true values of the intercorrelations among the 
variables under the assumptions just stated are, according to (16) and (17), 


1 009331 .639505 -639505 
-669231 1 .639505 «059506 t 
.639505 .639505 1 .756725 
039505 «090505. .756725 1 


It is interesting to note that the values of βιο and йз: are, respectively, 
identical to r,» and rs, to four figures, and that δις is identical to four figures 
with the average value (Jı 227-5 7120/4 
< . The best estimate of P., is obtained by substituting Ais for pz, in equation 
1, fis for ps , and βιι for ρω. The result is P., = .898643. 


[ê::] = 


[2:1 = 


TABLE 2 


Points Lying on the Two Curves 
Representing Equations 50 end 51 


Equation 50 Equation 51 


the °з 3. а 

60 54.2 60 65.5 
50 74.0 70 52.6 
52 70.7 75 48.4 
51.7 71.169 70.8 52.58 
51.64 71.2627 n.18 51.786 
51.63 71.2855 71.265 51.6511 
51.658 71.2721 71.27 51.6238 


71.270: 


PSYCHOMETRIKA 
212 


δ; etermined from (37), (45), (46) and (26). Equa- 
" ane Xd Parad ж graphically by finding the intersection. of 
аиа representing the equations, а larger and larger scale being 
ii en x i plots of the curves in order to locate the intersection 
ος н А shown in Table 2, less than ten points had {ο be computed 
τα ποῖ for each of the two curves in order to find their intersection 


graphically to five figures. The best estimate of the covariance matrix under 
the null hypothesis was thus found to be 


86.330 51.639 60.665 60.665 
[2] 51.639 86.330 60.665 60.665 
ger] = š 

60.665 60.665 97.552 71.270 


60.665 60.665 71.270 97.552 


The estimates of the correlations between the variables under the null 
hypothesis are 


1 -59815 .66106 .66106 
1 59815 1 66106 .66106 i 
-66106 .66106 1 -73058 
-66106 66106 78058 1 


ompleted, finally, as it would have been if the 
number of cases had been 101. In this case, 


» _ 100 
χι = 64g (85-30) = 5.4475 and χι = 233. 


A Solution for М aximum Lik 
Since (7) the normal multivariate distribution is determined by its first 
two moments, (77) the second moment; 


ts, ( i nts of a sample from a normal distri- 
bution are distributed independently of the first moments, (jii) the problem 


here is concerned with second moments and not with first moments, it follows 
that we may restrict attention to the sampling distribution of second moments, 
i.e., to the Wishart distribution (e.g., [2], pp. 403-406). For Present purposes 


elihood Estimators 


i 


| 


μμ... 


FREDERIC M. LORD 213 
this may be written 


4 4 

(3) Loy و ,م‎ σιι) = К | с κ. exp (= > > г), 
where К is an expression that does not involve ће σε; , № is the number of 
examinees, | σ | is the determinant of [c;;], «^^ is an element of the inverse 
matrix [r] = [σι], and δι; is the unbiased sample estimator obtained by 
multiplying the usual sample variance or covariance by N/(N — 1). It is 
implicit in (3) that | σ | = 0. ἡ 

It is convenient to rewrite the restrictions in (2) in terms of the σ᾽’, so 
that they become 


(4) at 17 33 44 13 23 14 2e 


There is implicit in H, a further restriction not ordinarily encountered in 
the usual multivariate tests of significance. This restriction is that P., < 1. 
Since by (1), 


5 P., = σ.ν/σεσν 2. σεν , 
τ V/(ns/ 02) (04/05) V 012081 


this restriction may be restated as 


(6) ei, < 013031 - 


It should be noted that this inequality is not implied by the Gramian character 
of the covariance matrix. This inequality reflects the fact that z, and z; are 
known to contain errors of measurement that are uncorrelated with ys and 
Yı and that уз and y, are known to contain errors of measurement uncorre- 
lated with z, and z, . These errors of measurement impose an upper limit on 
the correlation between variables z and y. "sd 

A restriction involving an inequality introduces certain difficulties into 
the maximization problem. It will be convenient first to carry through the 
maximization while ignoring the inequality. The results obtained when the 
inequality is taken into consideration will be worked out in a later section. 

The quantity to be differentiated may be written 
(7) Q= log lel = Drs. 
It will be convenient to differentiate partially with respect to the a°’ rather 
than with respect to the σι; . PM 

The partial derivative of log | с | with respect to σ'' is 


(S) 8lglel. (5. 555, 


dod 
Where à,, = 1 if = j, and δε, = 0 if i = j. 


214 PSYCHOMETRIKA 


The partial derivatives of (7) are to be found under restrictions (4). À 
convenient formula states that the derivative of any function f(z, y) with 
respect to z under the restriction that х = y is equal to 


afl, y) , afe, у) 
[ δὲ T әу |. 


With the help of this formula, it is readily found that 
90 


(9) aon = 201 — Sn — Soo + 


If (9) is placed equal to zero and a “hat” (-) placed over the unobserved 
quantity, it is found that 


(10) ón(= é) = 36 + S22). 
Similarly, 

(1) бз(= фы) = 3(8a + 844), 
(12) bis His 

(13) ὅμι = Sears 

(14) # = 8, 


where é = д = 614 = да = δει and Š = (Sia + Sia + S23 + διι)/ά. Equations 
(10) through (14) express the maximum likelihood estimates of the unknown 
population variances and covariances in terms of the observed sample 
variances and covariances. 

The population value of the disattenuated correlation is given by (1). 
Since the maximum likelihood estimate of a function of parameters is equal 
to the same function of the maximum likelihood estimates of the parameters, 
the maximum likelihood estimate of P,, is 


(15) Pry = βια/ M Belus 
Since 
$n ©? δι 
(16) Pie δι Op EE 
WESS $ _ _ 81 + 84 + sats 
1 τα. = 24 
(17) HeT Mifit ЗМ, + Sh. + P 


it is readily found that 


P = S + 801 + Su + $n 
(18) zy E 


512534 


# 


FREDERIC M. LORD 215 


Formula (18) is different from those in the literature with which the writer 
is familiar, but the difference is negligible when N is large. 


Estimation Under the Null Hypothesis That P = 1 


The next step is to use the same approach to derive maximum likelihood 
estimates under the restrietive hypothesis that P., — 1. An equivalent 
statement of this hypothesis, as indicated previously, is 


(19) 0? = 015034. 


If Lagrange multipliers и апа у are used, the quantity to be differentiated 
may now be written 


а = —log| z | Σ 22 ов + uo" — o”) + io — o") 
T2 > > DEC — 012034); 
Tou 


(20) 


with J = 1, 2ап J = 8, 4. 
It can be shown that 


ó 
(21) acti = (δει — 1)(с,ов + омо). 


The necessary derivatives of (20) can now be written down. First, 


ὃ 
э = on — δι + n, + 2 У) D νιωί--Όσιοστιστι 
(22) idi τ τσ 
+ σιασιεσια F 034011712) 5 
oq _ 
9022 T 92а — $3 — µε +2 p» bi yr (— 2017012072 
(23) £ 2 


+ 012923024 + 031012022) - 


Now all values of σι are equal, and may be denoted simply by the 
symbol c. Setting (22) and (23) each equal to zero, adding them together, 
supplying the tilde symbol to designate the maximum likelihood estimates 
obtained under the hypothesis that P = 1, and replacing δι; by the equal 
quantity б, 

0 = 264 — (su + Soo) +2 Σ > vr (—26°6n 


(24) 
— 262612 + 20ἱδι. + 26611613); 


(25) 0 


ll 


26, — (sn + sa) + 4 Σ, Хиба: — E)n . 
т T 


216 PSYCHOMETRIKA 


Because of (19), 


(26) σος τοις 
'The desired maximum likelihood estimator is, therefore, 
(27) δτι(-- б») = (и + διὸ) = би. 
A similar equation holds for δια = би. 
Next, 
д9 


EZ = 200и — 2sen + 2 Σ Σ Yrs(—2ors (cereus + созсш) 
(28) AP σι»(σσοσηι = σοισην) + σαι(σσισηο ug созот), 

G@=1,2; H=8,4). 
Placing (28) equal to zero and summing on G and H yields 


Dio, αρα > Уйу So DS GSS, g, — 2° 3 | 
I J σ H 


(29) 


ΜΡ | 
+ διοῦμι + бнз + 63601 + бзлбог)- 
If У, Σα Gor Gs is written out in full, it is seen that, because of its symmetry, 


it will remain unaltered irrespective of the values assumed by J and J. If 
all terms of SIE are written out and collected, it is found that 


0 = 22 — 28 — LO» >, vri) (61655 — 615633 
(30) TE 


— 664, — 3212034 F 463). 
Using (26) and factoring gives the result 


(31) 2(6 — 8) = “(5 Уи) = διο)(ὅοι — бз). 


Before doing further work with (31), it will be helpful to work with 
ὃ 


(82) dc? — Ἅσια — 25 + 2 > > "rz -2e15(01105; + 015051) 


zl σιοσιασοι + 14023) + o34 (011022. + σὶο)]- 
Placing (32) equal to zero, it is found that 


8) 9 se fla, cdi o» > ντος 26.6" ks 83481 T Časa). 
Using (26) and factoring, 
(34) Giz =S = g Leu — δις) 08. 


FREDERIC M. LORD 217 


By symmetry, 
(35) ووت‎ — δει = E, 2 νεο) Gos = Gas) δις . 

Equations (26), (31), (34), and (35) remain to be solved, the four un- 
knowns being 2, δια, бз, , and the quantity У.У), vj; . As a first step, this 


last quantity will be eliminated from (31), (34), and (35). Multiply together 
(34) and (35) and use (26) to obtain 


(36) (δια = 513) (634 = S34) = Os рв vra) (6 = δις) ἴδει = δει) E. 
1 J 

Square (31), subtract from (36), and substitute ὁ,, for s;; to get 

(87) (G12 — di) (6. — бы) = 46° — 868 + 46”. 


Multiply (94) by δια (G33 = бз), multiply (31) by σ(διι — διο), and add the 
results to obtain 


(8) 26 = $)&(8 = δις) + (διε — G12)Fix(Gss — бы) = 0. 
Similarly, : 
(39) 2(6 — δ)δίδια — δι) + (δει — ὅπι)δει(διι — δι) = 0. 


The next step is to eliminate ὅ from (37), (38), and (39). E 
Multiply (37) by (σιι — διι), add 4 times (38) and use (26), obtaining 


(40) ίδιο E διο)διο(δῃα E бм) 

= (д — διο) [46 = (δια — ια) (634 — δι) — 4διοδει]. 
Similarly, 
(41) {αι = δω)δαι(διι -- 632) 

= (фа — δει) [46^ — (διε — δι) (δει — διὸ — 4312634]. 


Two possibly simpler equations may be derived from (40) and (41), as 
follows. Multiply (40) by (δι; — διὸ and (41) by (ди — δι) and subtract 
one from the other to obtain 


(42) Gieli xp 612) (63s has ss)” Tx (94 Es. 63) (61i 7 δις)”. 


Multiply (42) by δια (διε — δις), extract the square root of both sides of the 
equation, multiply by 4, and subtract from (40). If (Zi; — δις) 74 0, as will 
ordinarily be the case, the resulting equation may be divided by this quantity 
to produce 


(43) 4615634 + (дз — διο)(ὅαι — ба) 


- πο... Προ 
3- ἀνδιοδει(διε — διο)(δοι — бы) = 46. 


218 PSYCHOMETRIKA 


Take the square root of (43) to obtain, finally, 
(44) 2/6163. HE Vis = διο)(δει — 631) = +25. 


No simple explicit solution for δι» and ει seems to be obtainable. Equa- 
tions (42) and (44) may, of course, be solved by iterative processes. In the 
writer's experience, however, the iterations have converged slowly. Since 
(40) and (41) are each linear in one of the unknowns, it is probably easier 
to obtain the solution graphically from the intersection of the two curves 
representing these equations. If (40) and (41) are solved for бы and δι», 
respectively, the resulting equations, from which the curves may be plotted, 
are 


4Β4" + ABés, = 4 Αδιιδιι 


(45) O34 = AB + 46,(B = A) D 
= _ 400° + CDéw — 40.9 
46 Р 7 12 O34011 
nd "s CD+ 4-0 ° 
where А = δια T δια, B = δι — 912, С = 834 — O54 " D= δια — O34 - 
In the special case where 8 = 833 , S22 = 54,5, = δι a Sia = Sag, 8nd 


814 = 8з , Symmetry requires that δι = 


: : δω. In this case, (44) is readily 
solved, the solution being 


(47) δι = δει = 6 = i5 + ss + 819). 


This result suggests that whenever the actual data approximate the fore- 
going special ease, (2812 + 48)/6 and (2s,, + 45)/6 can be used as con- 
venient first approximations to δι» and μι, respectively. 


When the values of δι and δι, have been determined, the value of ὅ 
may then be obtained from (26). У 


Maximum Likelihood Estimation Under H, 

As previously noted (6), the logic of the mathematical model appropriate 
for the present problem imposes for H, the restriction that P., < 1 or, 
equivalently, that oi; < 012034 . The values of ὅ,, represented by (10) through 
(14) were obtained without attention to this restriction, These values provide 
maximum likelihood estimators of σι; under H, only so long as they do not 
violate (6). If the symbols ὄ,; are used to denote those values of σι, consistent 
with (6) for which (3) is maximized, then it can be shown that = 


(48) pw [TM if these values satisfy (6), 
δι, otherwise. 


The ġ;; are the maximum likelihood estimates of the σι; under H, . Whenever 
the é;; violate (6), the maximum likelihood estimates under H, and under 
H, are identical. 


1 
ES 


FREDERIC M. LORD 219 


The Likelihood Ratio Test 
The likelihood ratio for testing the hypothesis that P = 1 is ([6];p. 257) 


ы (би » 98, "~" ба) 
(49 X es 71.012; 2.044), 
) Lu. y G12, "° ‚ баа) 


This ratio will equal 1 whenever P., > 1. Because Н, involves an inequality, 
the usual theorems on the large-sample distributions of likelihood ratios 
cannot be applied blindly. Applieation of a recent result obtained by Chernoff 
[1], however, shows that in large samples 


| Prob (—2 In À < 0) = 0, 
(50) Prob (—2 In À = 0) = 3, 
Prob (—2 In À € x?) = 1 + Oå), 


where In А is the natural logarithm of X, and F(x) is the cumulative distri- 
bution funetion of a quantity distributed as chi square with one degree of 
freedom. 

It can be shown that 


(51) È È isu = t, 
and that 

(52) E Id4 = 4. 
From (49), (3), (51), and (52), 


(53) TA pur à 


The large-sample significance test is carried out by computing 
xi = —2 ҺА 
(N – D(n|# | = n | ç D) 


2.3026(N — 1)(logio {| = logio E p. 
e value of the determinants in 


(54) 


Convenient formulas for computing th 
(54) are readily found: 


(55) [4 | = (ën — ё) (за — δι) [(διι + διο)(δος 
= а AJ — Ao + ó 


Where βία PS [4 V öns Bia = 615/011 , etc. The formula ton | ч | 
Simply by switching diacritical marks in (55). 


+ бы) — 4018] 


a(l + bss) — 4655], 
is obtained 


— 


| 
i 
і 
i 


220 PSYCHOMETRIKA 


It is worth noting that the ratio | ¢|/| & | does not involve any variances, 
being expressible solely in terms of correlation coefficients. Thus (54) may be 
rewritten : 


(1 е βιο)(1 E Bax) [(1 Els βιο)(1 F 234) xx 4515], 
Р TT У — ADI + SG + AD — 465] 
Although the significance test can be written solely in terms of the б and 
p; , as in (56), it can mot in general be written solely in terms of the usual 
sample product moment correlations, r;; . 

With the aid of (50), the computed value of x? may be looked up in a 
table for chi square with 1 degree of freedom. Since χι is strictly normally 
distributel ([7], р. 408), however, it will be more convenient to work with 
χι instead of хт and to use the normal curve table instead of the chi square 
table. Only the positive tail of the normal distribution should be used to 
obtain the signifieance level. Thus x, — 1.64 is at the five per cent level, 
xı = 1.96 is at the 23 per cent level, and x, = 9.33 is at the one per cent level. 

If for any given set of data the value of P., is found greatly to exceed 
1.00, the experimenter should reconsider the assumptions underlying the 
foregoing significance test. Too large a value of P., may be due to lack of 
independence in the errors of measurement of the test scores, or to lack of 
parallelism between supposedly parallel tests. Lack of parallelism can be 
tested by use of Votaw's test [8], as already noted. Independence of the 


errors of measurement must be guaranteed in advance by careful experi- 
mental design. : 


REFERENCES 
[1] om H. On the distribution of the likelihood ratio. Ann. math. Stat., 1954, 25, 
573-578. 
[2] Cramér, H. Mathematical methods of statistics, Princeton, N. J.: Princeton Univ. 
Press, 1946, 


[3] Gulliksen, H, "Theory of mental tests. New York: Wiley, 1950. 


14] Kelley, T. L. Fundamentals of statistics. Cambridge: Harvard Univ. Press, 1947. 
[5] eo 5 A study of speed factors in tests and academic grades. Psychometrika, 1956, 
, = Р 


[6] Mood, А. В. Introduction to the theory of statistics. New York: McGraw-Hill, 1950. 


[7] Peters, C. C. and VanVoorhis, W. R. Statistical di i thematical 
bases. New York: McGraw-Hill, 1940. Ll UMS 


[8] Votaw, D. F., Jr. Testing compound symmetry in а normal multivariate distribution. 
Ann. math. Stat., 1948, 19, 447-473. 


Manuscript received 12/10/56 


αἴ... 43 


ps 


= 


PSYCHOMETRIKA—VOL. 22, NO. 3 
SEPTEMBER, 1957 


MARKOV PROCESSES IN LEARNING THEORY 


Joux G. KEMENY AND J. LAURIE SNELL 
DARTMOUTH COLLEGE 


Consideration is given mathematical problems arising in two learning 
theories—one developed by Bush and Mosteller, the other developed by 
Estes. The theory of Bush and Mosteller leads to a class of Markov processes 
which have been studied in considerable detail (see [1] and [7)). The Estes 
model can be treated as a Markov chain, i.e., a Markov process with a finite 
number of states. For an important class of special cases, it is shown that the 
Bush-Mosteller model is, in a sense, à limiting form of the Estes model. The 


limiting probability distributions are derived for the cases treated in both 
models. 


1. Summary of Results 


The Markov chains discussed in the early sections arise from а study of 
a learning model proposed by Estes [3]. Only a special case which leads to 
the Markov chains will be discussed in this paper. 

In a sequence of experiments, à subject is to give one of two responses, 
В, or R, . After his response, the experimenter makes one of two possible 
reinforcing actions A, or Ao - Tt is assumed that the choice of the subject is 


determined (in à probabilistic sense) by a set of n stimulus elements. Each 


of these stimulus elements at the beginning of an individual experiment 


is connected to one of the two possible responses. These connections change 
as the experiments proceed. 

Before making his choice, the subject either samples or does not sample 
each of these stimulus elements. It is assumed that the sampling gives a set 
of n independent trials with probability 0 that a particular stimulus element 
be sampled (in the more general model this probability depends upon 8 
stimulus element). It is assumed that the probability that the subject ma i 
response R, on a given experiment is equal to the proportion of elemen 
connected to R, in the set which he samples. ٤ 

The hé êê of the experimenter is assumed to change all the Ec. 
of the stimulus elements of the set sampled to agree with the choice. 


i led 
example, if the experimenter does A, then all elements w poe Es 
and which were connec 


ted to R, become connected to R- 
μα ary to give the method 


i 16 is necess he 
To specif completely the process, 1 Š « e 
used by the experimenter to generate his A’s. Various different possibi 


jon is restricted to the 
lead to different proc 


esses. In this paper considerati е. 
*This research was SUP! orted by the National Science Foundation throu£ a gr 
18 те! a > 
given to the Dartmouth Mathematics Project. 
221 


222 PSYCHOMETRIKA 


case where the experimenter chooses A, with probability p and A, with 
probability (1 — p), independent of the choice of the subject. In this case, 
a Markov chain is obtained by taking as a state the number of stimulus 
elements connected to R, at the beginning of a single experiment. This is the 
Markov chain given in section 2. 

In section 3, a method for calculating the limiting distribution for these 
Markov chains is given. In section 4 it is shown that these distributions also 
arise from a very simple probabilistic process. In many applications of the 
Estes theory the learning parameter 0 is believed to be small. In section 5 
it is shown that for small values of ϐ the limiting distribution may be approx- 
imated by the binomial distribution with parameter p. 

Once the limiting distribution for the above Markov chain is known, 
it is possible to find the limiting probability for a response of each type by 
the subject. It is then possible to predict from the model how the subject 
will behave, in terms of how the experimenter proceeds. For example, in the 
case studied in this paper, the model predicts that, if the subject chooses 
response A, with probability p, the limiting probability that the subject 
will choose R, is p. In other words, the subject will tend to match the behavior 
of the experimenter. Experiments have verified that this is generally true. 
For a more detailed discussion of these results the reader is referred to 
Kemeny, Snell, and Thompson [8]. 

Τη section 6 it is shown that, for fixed value of 0, as the number of stimulus 
elements tends to infinity the limiting distributions tend to a distribution 
F which depends only on 0. In section 7 it is shown that distribution F is 
the limiting distribution for a Markov process of the Bush-Mosteller type. 
This result connects the two theories and suggests that the Bush-Mosteller 
model may be thought of as a limiting case of the Estes model. 

In sections 8 through 11 the distribution F is studied. It is shown that 
its character depends essentially on the value of 6. As in the case of the 
limiting distribution for the Estes model, it is possible to give a very simple 
probability process which leads to the distribution F. However, the function 
F itself is extremely complicated for most values of 0. 

It is interesting to note that some famous pathological distributions, 
constructed by mathematicians as counterintuitive examples, actually occur 
in a practical application. For example, the Cantor set can be defined as 
the set of all numbers on the unit interval whose expansion in the number 
system to the base 3 does not contain the digit 1. For one value of 6 the distri- 
bution / is concentrated on this strange set. 


2. A Class of Markov Chains 


In this section the class of Markov chains that arise in the Estes learning 
model is discussed. These chains can be described in terms of drawing balls 
out of urns, 


— ay 


= 


JOHN G. KEMENY AND J. LAURIE SNELL 223 


Two urns, A and B, contain altogether n balls. An urn is chosen, and 
each ball in that urn is removed with some fixed probability and placed in 
the other urn. Assume that the probability that urn 4 is chosen is 1 — p, 
and that urn B is chosen is p. Assume that each ball is chosen with probability 
0, and that the choices are independent of one another. 

Form a Markov chain by taking as states the number of balls in urn 4. 
A single step then is the result of the transfer of a certain number of balls 
from one urn to the other. The transition probabilities are: 


Pin =o" g σα -- or, 0 <, 


Ὁ Pian = (1— να — 03 (0 < k < p, 


pia = PL — A + (1 ра — 0. 
For each choice of n and 0 the resulting Markov chain will have a limit- 
ing distribution, Denote the limiting probability of being in state j by 900) 
or simply by q; in апу discussion where 0 is fixed. 


3. A Recursion Relation for the Limiting Probabilities 


In this section a recursion relation is obtained which enables one to 
calculate the limiting probabilities 7(0) for a given n from the knowledge of 
these probabilities for smaller values of n. 

In order to do this, consider a slightly different description of the Markov 
chain. Suppose the order is reversed: first draw out each of the balls from 
each of the urns with probability 0 for each ball; then take the subset of 
balls obtained and with probability p put.them all in urn A and with prob- 
ability 1 — p put them in urn B. Then the same Markov chain described in 
Section 1 is obtained. ; 

With this description, consider the limiting probability that there are 7 
balls in urn A. Obtain this probability by finding first the probability that 
k < j balls are chosen and put into urn A; of the n — k not chosen j — k 
were in urn A. Secondly the probability that k < n — j are chosen and put 
into urn B, and that of the n — k not chosen j were in urn A. This leads to 
the following recursion relations: 


p (sta - ο απ + (1 - 5 ("ora = gat, 


q = 1. 


І 


4. An Auxiliary Process 


А As is often the case with Markov chains of this kind, ther 
Simple way to describe the limiting distribution. 


e is a rather 


224 PSYCHOMETRIKA 


Consider the z balls outside the two urns. Choose each ball with prob- 
ability 6 and put the subset obtained into urn 4 with probability p, and 
into urn B with probability 1 — p. Repeat this procedure on the set of balls 
which were not chosen the first time. Continue this process until all the balls 
are put into the urns. Then the limiting probability 0; for our Markov chain 
is the probability that this new process puts j balls in urn 4. The proof of 
this consists in observing that these probabilities also satisfy the set of 
recursion relations (2). This process will be referred to as the auxiliary 
process. 

'The recursion relation (2) does not seem to have a simple solution. 
However, the situation can be simplified somewhat as follows. Let $4 = Qn. 
That is, s, is the limiting probability that all the balls are in urn A. Then 
from (2) one obtains the following recursion relation for s, : 


8 s1-ü-9]-»3 ("la - oe, 


8 = 1. 


All of the other limiting probabilities ean be obtained from the knowledge 


of the s, . In the auxiliary process, number the balls. Let A ; be the event 
that the jth ball ends up in urn A. Then 


РІА] = а, 
Р[А; and A] = $, G = № 
and in general the probability that some particular r of the events occur is 


s, . Moreover q; is the probability that exactly j of these events occur. Then 
(see Feller [4], p. 64) 


@ a= (8) е"). 
5. Some Limiling Cases 
If in (3) 0 approaches zero, the recursion relation is 
Sn = pha, (151) 
S = 1. 


The solution of this is s, = p". From (4), the limit of q? as 0 — 0 is the 
binomial distribution with parameter p. That is 


tim αἴθ = (Ра — a. 


` —rq ==, "xvm س س‎ ον H 3- 


— m 


JOHN G. KEMENY AND J. LAURIE SNELL 225 


'This is intuitively clear, sinee in our auxiliary process, 0 means essentially 
putting the balls in the urns one at a time. If they were put in one at a time, 
the binomial distribution would be obtained. 

If in (3) 0 approaches one, then the recursion equations become 


Sa = pSo ; 
$9 = 1. 
From (4) 
{ P, (j =m 


limq(0) = є 0, (ο <ј <) 
0-1 
lı -p (= 0. 
This again is intuitively clear, since, for 0 near one in this auxiliary process, 
it is almost certain that all the balls will be chosen at once, and then it is just a 
matter of which urn they are placed in. 
Finally consider the case 0 = 1/2, p = 1/2. In this case qç = 1/(n + 1) 

for all j, a uniform distribution. This ease, unlike the last two, does not seem 
to be suggested by intuition. 


6. The Limiting Distribution for Large т 


Referring again to the auxiliary process, let X; be a random variable 
which is 1 if the jth ball is put in urn A and 0 otherwise. Let 


8, = (X, + X, + °° + X,)/n. 
The distribution of S, , then, is the limiting distribution for our Markov 
chain normalized by dividing by n. The distribution of S, is a discrete distri- 


bution on the unit interval which puts mass q; at the point j/n. 
The jth moment of S, is, by the multinomial expansion, 


iy ul j! ЖЕ a ten 
E(Si] a. Pas sa P re ας]. 


espo οκ ТЫГЫ 


Here E(z?zy +++ 27] = s, where k is the number of r, which are not zero: 
Thus, rewrite the sum in the form 


E{Si} = > аз, , where Σ a, = 1. 


Ifk > n, then a, = 0. In the case k = j < n, all τι are 0 or 1. The coefficient 
of the term s; in this case is 
a; = [nn — 1) τε (n — j + 1)]/n:, G < п). 


i f the 
Letting n tend to infinity, for fixed j, a; tends to 1 and, since the sum o 


226 PSYCHOMETRIKA 


а, is one, the remaining terms must tend to 0. Thus 
lim Ε{(8.}}} = s. 


Thus the random variables S, converge in distribution to a random variable 
8 whose moments are given by so, s, , ‘++ . Denote the distribution of S 
by F(A). 


7. The Bush-Mosteller Markov Process 


In the last section it was shown that, when n approaches infinity, the 
distribution determined by the weights q; tends to a limiting distribution 
F(A) on the unit interval. It will now be shown that F(X) is the limiting distri- 
bution of the Markov process defined by 


BOA (1 — θα + 0 
کد‎ (1 — θα. 


This is a process which moves on the unit interval so that if at any time it is 
at z, it moves on the next step to (1 — 0)z + 0, with probability p, and to 
(1 — 6)z with probability 1 — p. This is a special case of the Bush-Mosteller 
learning process. 

If this process starts at v, then the possible positions after n steps can 
be written as 


x 


(1 — 0" + ава — 97, 


where a; is 1 or 0, The probability of being at any particular such point is 
р'(1 — p), where r is the number of the a; which are 1. 


Thus the distribution of the position of the process after n steps is the 
same as that of the random variable 


S, = (1 — θα + = 6001 — ϐ)’-', 


where e; , j = 1, 2, +++ , are independent random variables such that є = 1 
or 0 with probability p and 1 — p, respectively. Thus any question involving 
the distribution of the position can be answered by studying the sequence 
(8,1, n = 1,2, +++. 

It is clear that the distribution of S, approaches a limiting distribution 
independent of the initial point z, and this distribution is in fact the dis- 
tribution of the infinite sum 


© 


S= >> 4601 — om, 


i=l 


] 


———— ^ E 


JOHN G. KEMENY AND J. LAURIE SNELL 227 


To show that this limiting distribution is the same as F(X) found in 
section 6, it is sufficient to show that the moments of the distribution of S 
are the same as those of F(A). 

Let P, be the distribution of S, . Then 


Ε|(8..0’ | 8, = αἱ = pl — θα + 6 + (1 — ία — 0а]. 
Thus 


E((S..)'] = f pK — θα + 0] dF, + [ (1 — p — θα] dF, . 
Passing to the limit, 


Е{5'} 


І 


Р i (δα — PELS + (1 — pQ — 9/E(S'), 


E(S*] 


1 


This recursion relation is the same as (3), found for the values s; . Thus 
the jth moment of the limiting distribution is the same as that of FQ), 
namely s; . Therefore the limiting distribution must itself be F(A). 

If 0 tends to 0, then E(S’) approaches χ᾽, as was shown in section 5. 
Thus the limiting distribution will be concentrated at the point p. 


8. The Distribution of S for the Case 0 > 1/2 


Since S is independent of the initial position, assume that the initial 
position is 0. Then 


n 


S= 2y 601 — 9, 20) 
j=1 
S, = 0. Y 
Assume that the first n of the є; are known, i.e., that the position of S, , call 
it æ, is known. What then are the possibilities for S? If e+ = 1, then 


S > z + 6(1 — 6)". On the other hand, if e, = 0, then the most that S 
could be is found by making all future e;'s equal to 1. That is 


5 <2+ Saj 


n+2 


= „+ )1 = 0. 
Thus, if S, = 2, then it is impossible that 5 be in the open interval 


[e + (1 — 0)", æ + 0(1 — 0)"]. The intervals obtained in this πα by 
considering all possible values for 8, , for all possible n, are ο. 2 
total length 1. The value of S cannot lie in these intervals. p 


228 PSYCHOMETRIKA 
values for 5 are all z such that 
= 3 а,б — oF, 
where a; is 0 or 1. The set of all such z is a set of measure zero. In the case 


0 = 2/8, this set is the Cantor set. 
The function F(A) is constant on the interior of all the intervals described 


above which cannot contain S. Moreover, if z = У), а,0(1 — 0) isa 
possible position for S, , then S will lie in the interval [z, z + (1 — θ)" if 
and only if e; = а; for j = 1, --- › т. This will occur with proba- 


bility p'(1 — p)", where r is the number of the a; which are 1. Thus 
F(z + (1 — 0 — Fa) = pa, “ση. 


It is clear that this determines F completely and in fact that F is a 
continuous function but not absolutely continuous. For a detailed discussion 
of this function F for the case 0 = 2/3, see Hille and Tamarkin [5]. 

It is interesting to observe the following fact for the case 0 > 1/2. Let 
E be the set of possible values for S. If the process begins in E, then it never 
enters the complement of E. On the other hand if it starts in the complement 
of E, it never enters E. But as » increases the position approaches Z, and the 
limiting distribution is independent of the starting position. 


9. The Case 0 = 1/2 
In this case the possible values of 8 are all real numbers on the unit 
interval [0, 1]. This is the case since the possible values of S are numbers 
z = »7a;/2', where a; is 0 or 1. But such an z is the number on the unit 
interval whose diadic representation is 2,250; --- . 
Let [a, b] be an interval with 


а = At +--+ a,0, 
b = „аа +--+ a,1, 


in diadic representation. Then 
F(b) — F(a) = p'a — py, 
where r is the number of the a; which are 1. 

In the case p = 1/2, 1/2" is the difference of Ё on such an interval. 
This determines F(A) as the uniform distribution, i.e., F(A) = А. 

In the case р = 1/2, it will be shown that (X) is not absolutely con- 
tinuous. Let Æ be the set of all z between 0 and 1 which, when expressed in 
diadic form, have the property that the proportion of 1’s in the first n digits 
approaches p. Then, by the law of large numbers, the measure determined 
by F(A) assigns measure one to the set R. On the other hand for р = 1/2 the 


JOHN G. ΚΕΜΕΝΥ AND J. LAURIE SNELL 229 


limiting distribution determines the ordinary Lebesgue measure. By the law 
of large numbers, this measure assigns measure one to the set of points whose 
diadie representation is such that the proportion of 1’s in the first n digits 
approaches 1/2. Hence, R has Lebesgue measure 0. Thus the measure de- 
termined by F(A) assigns measure one to a set of Lebesgue measure 0. Thus 
F(A) is not absolutely continuous. 


10. The Distribution of S for 0 < 1/2, p = 1/2 


A sequence of values for 0 will be given which approach one and such that 
the limiting distribution for these values is absolutely continuous. It was 
shown by Jessen and Wintner [6] that the distribution of S in this case is 
either absolutely continuous or purely singular. Erdós [2] on the other hand 
showed that for a certain class of values of 0 the distribution is purely singular. 
It is not known whether this class includes values arbitrarily near 0, but it 
does include values less than 1/2. One such choice for 0 occurs when 1 — 6 
is the “Fibonacci” value 1 (4/5 — 1). 

The sequence of values for which the limiting distribution is absolutely 
continuous consists of the sequence 0, = 1/72, n = 1, 2, +++. For such a 
value of 0 


= i aa — 1/v/2)0/N2)^. 


This sum can be written as the sum of » subsequences, each of which is a 
constant times a series of the form У), ¢;/2’, where e; is 0 or 1 with prob- 
ability 1/2. Thus 
S= aX, 
i=l 

where the X; , j = 1, +-+ , n, are independent random variables with a 
uniform distribution on the interval [0, 1]. The actual values of a; are given 
by 


= 20 — 1/22", 


for j = 1,2, e- n. 
The function F, can be found, following the method of Uspensky 
([9], p. 277); it is piecewise a polynomial function. 


11. The Characteristic Function of S 


Since Sis the infinite sum of independent random varia 
istic function of S, i.e., Fourier transform of F(A), is t 
characteristic functions of the summands. Thus 


bles, the character- 
he product of the 


Е (е‘%} E П [a TT p m Mee 


* 


230 PSYCHOMETRIKA 


In the case p = 1/2, this can be simplified to 
exp (it/2) [T cos [(1/2)t0(1 — 0)7]. 
i=1 


REFERENCES 
[1] Bush, R. R. and Mosteller, F. Stochastic models f. 
[2] Erdós, P. Amer. J. Math., 1939, 61, 974-976. 
[3] Estes, W. K. and Burke, 
Rev., 1953, 60, 276-286. | 
[4] Feller W. An introduction to probability theory and its applications, vol. I. New York: 
Wiley, 1950 
[5] Hille, E. and Tamarkin, J. D. Amer. math. Monthly, 1929, 36, 255-264. 
[6] Jessen, B. and Wintner, A. Trans. Amer. math. Soc., 1935, 38, 61. 
[7] Karlin, S. Pacific J. Math., 1953, 3, 725-756. 
[8] Kemeny, J. G., Snell, J. L., and 
New York: Prentice-Hall, 1957. 
[9] Uspensky, 
Hill, 1937. 


or learning. New York: Wiley, 1950. 


C. J. A theory of stimulus variability in learning. Psychol. 


Thompson, G. L. Introduction to finite mathematics. 


J. V. Introduction to mathematical probability. New York: McGraw- 


Manuscript received 7 /18/56 
Revised manuscript received 1/11 /57 


; 


 — O 


PSYCHOMETRIKA—VOL. 22, No. 3 
SEPTEMBER, 1957 


NOTE ON THE LEAST SQUARES SOLUTION FOR THE 
METHOD OF SUCCESSIVE CATEGORIES* 


R. DannELL Bock 
UNIVERSITY OF CHICAGO 


The problem of estimation in the method of successive categories is 
reconsidered and a new least squares solution is obtained. An empirical com- 
parison of this solution with Gulliksen’s solution is presented. 


Gulliksen [8] has proposed a least squares solution for the method of 
successive categories which, he suggests, is formally equivalent to Horst’s 
[10] solution for the matrix of incomplete data. Because of the way in which 
the terms to be minimized by least squares are defined, there appears to be 
an important difference between the two solutions. If the purpose of the 
method is to estimate location and scale parameters for the distributions of 
preferences of the objects being rated, then Horst’s definition is perhaps the 
more appropriate. 

In Horst’s formulation the observation matrix consists of quantitative 
ratings assigned to a number of objects by a number of raters. These ratings 
enter into the solution without grouping or other transformation. The aim 
of the solution is to characterize each object by some single score and to 
adjust the location and seale of the ratings of each rater so that the sum of 
Squared discrepancies between the single scores and the adjusted ratings is 
minimal. If there are p raters and q objects, this sum ranges over pq terms. 

The original data to which Gulliksen’s solution applies are also the 
ratings of a number of objects by a number of raters, but the ratings are 
made in coarse, successive categories and are grouped without retaining 
the identities of the individual raters. On the assumption of normally dis- 
tributed affective values underlying the coarsely grouped ratings, the pro- 
Portions of ratings for each object falling in the various categories are 
transformed to normal deviates. If there are r categories, the resulting r Χα 
table of deviates is considered the observation matrix. The problem 1s then 
to adjust the location and scale of deviates in the columns of this table so as 
to minimize the sum of squared diserepaneies between the adjusted е 
апа some single value assigned to the respective category boundary. Š i 
the extreme boundaries are indeterminate, this sum ranges over gr — 
terms. 
αίμα. μος of this paper has been supported by the etel jh "e those ο 

itute for the Armed Forces. Views or conc erent of the Department o 


le author and do not necessarily reflect the views or 
efense, 


231 


222: PSYCHOMETRIKA 


Thus, Gulliksen’s solution does not take the original ratings as obser- 
vations nor does it seek a transformation of the category boundaries which 
minimizes the discrepancies between the p ratings received by the objects 
and the location parameters which characterize their mean 


affective values. 
Instead, it treats the normal deviates as observ 


ations and seeks a single 
zes squared discrepancies 
ion and scale. As a result, 
only a few ratings, are 
ay be possible to correct 

should not be supposed 
that the Miiller-Urban weights are appropriate for this purpose. The justifi- 
cation for the Müller-Urban weights is that the 


The solution which results fr 
minimizing what is perhaps a more a rong 2 
several other respects. It yields dire, ο TES "m 
quantify the ratings for further statist; neat, Which may be used to 
ratings between raters or objects, for analysis of i eg, 
in the frequency table cause no difficulty, The des lanee, e 
results, which are rather different from ek 
in matrix operations which are easy 
solution can be obtained either direc 
labor. Critical steps in the computa, 


for correlating 


be expressed entirely 
to follow durin, i 
сот] H 
ü Top f g aputations. The 


tions invol 


n HERDE, RODE 233 


matrices of order one less than the number of categories AWOL 


mate solution is also possible. 


Derivation of the Least Squares Solution 


Data obtained by the method of successive categories may be exhibited 
m a frequency table of the form 


Objects 
j j = 1, 2; > 305 
(1) Categories kms | re b= 1,2) 3i 
Tn; ln. : 0 €"; < P- 


By Pearson's formula (see Guilford [7], p. 237), normal deviates, zi; 5 jar 
the centroidal points of the intervals spanned by the categories in ln 
distribution in the columns of (1) may be determined. Then, τότ 
the law of categorical judgment, the population values f: corresponding 
to the deviates z,; will be related to the true scale values for the categories 
& by the relation 


σιζει = Ë — μι; th 
; e 

where σι is the diseriminal dispersion and u; the mean affective ча O aa 

jth object. In the sample the relation between the corresponding e 

9f these quantities would be 


(2) 


Sig; œ= Te — mi . 


Following Horst [10], the problem is to determine values for 5; Ζεν 


| s 
т, for which the congruence of the left and right pud te 
Maximal. A measure of congruence is given by the correlation er 
the right and left members of (2) based on the within column sums ot sq 

and cross products. Then 


[2 > 5(: — 2.) F mni; 
(3) R? єз "m k yn J D 
[25 by 8; (2 d 2.) nul 27 > (8. “г” Ë 


and 


where 


® g = (È атк)/т. e 


f the origin and unit of the 


Since the maximum of R? is independent o BE 


assigned and derived seale-values, impose the condition 


(5) p» 25 (zr, — ту) = 0, 


234 PSYCHOMETRIKA 


(6) py Fi 85 (zu — 2.) u; = Ὃν Σ (z; — πι) μι = с, 


where c is any finite constant. 


Introducing the undetermined multipliers À, к, and differentiating the 
expression 


[> > 8/02 — 2.) — туп] — ASD ayy. — Σ тт.) 
= 42; Σο Gru — 2.) πμ, + 2 x (a, — m) nis — 2c] 


with respect to m; , z, , and s; , and then equating to zero yield, respectively, 
the stationary equations (8), (9), and (10). 


ο» > Sa. — 2.ε)(αι — πι) ηλ] Ds 


(7) 


δω — zm 


(8) 
= №; — κ > @ — тул, = 0, 
шо F Eten ελ, -- туы X ens — zm, — Xu. 
= κ 2 (x, — т) = 0, 
TT P» > Sz — za), — Τικ) κκ] > G — z.a — ту; 
— к > (д, — z.) n, = 0. 


Summing (8) on jor (9) on k gives À = 0. Multiplying (10) by s; and summing 
on j gives κ = с, 


Substituting for X and κ in (8) gives 
(11) Σ (z, — mn, = 2; S,(z,; — z )mn;/R. 
= (Dr ать) n, the right member of (11) 
m; = (Σ TM). . 
Substituting for À and κ in (9) gives 


Σ (αι — тулу = M 


i 


Since z,, 


vanishes, and 
(12) 


Siler — Z.) /R, 
and, substituting from (12), 


(13) αι, — xF > TMi); = > 


i 


Siler; — 2 j)n,/R. 
Substituting for « in (10) gives 


8; > (eri — г), = У (ek, — ga), — mjn;;/R 


k 


= > (z — 2. Jum; /R — m, X (и = z. Əna B. 


m πς 


R. DARRELL BOCK 235 


But У), (z; — 2.;)m; = 0, hence 
(14) 8; У Gey — ει) ms; = У) (i — DosR. 
k k 


To express explicitly the solutions for the systems of equations (12), 
(13), and (14), it is convenient to adopt matrix notation and to define the 


r X q matrix T = [êê — 2.1) πει], 

r X q matrix F =m, 

r X т diagonal matrix D, = [n..], 

q X q diagonal matrix D; = [n.j], 

q X q diagonal matrix D, = [Doe (z — 2) nal, 


1 X r row vector х = [zJ], 
1 X q row vector s = [si], 
1 X q row vector m = [mj]. 


Then (12) may be written as the matrix equation 
(15) n= zFD;' , 
and (13) may be written 


a(D, — FD;"F’) = sT'/R, 


or 

(16) zB = sT'/R, 
and (14) may be written 

(17) sD, = zT/R. 


(In these equations R is a scalar.) афа 
Equations (16) and (17) must be solved simultaneously. Substituting 
Tor s in (16), 
(8) a(TD;T'— RB) = 0. 
| ү ro, th 
Since the rows and columns of both TD; 7’ and B E. of Rš κ] 
determinant of (TDz Т” — ΝΒ) is identically zero for all va дегег Aes 
(18) cannot be solved as it stands. However, arbitrarily assign à 
to one of the elements of z, say z, , and write (18) as 
ΦΡΙΤΣ — R°B*] = 0, 
Where the asterisks indicate that the row and оош соле 
ave been dropped. Compute the inverse of B* and write 


(19) a*[((TDzT^)*B*^ — RY] = 0. 


onding to z, 


256 PSYCHOMETRIKA 


Thus, z* is the latent vector associated with the largest root of the determi- 
nantal equation 
| (CD2T^*B*"? — ру | = 0. 
This root and vector may be evaluated readily by Hotelling’s method [11]. 
It is easy to show that 
ть = z, + xt " 


where z* — 0. Hence, the subsidia: 
by determining z, from 


(20) t= -(€ zim.) a. : 


A second iterative method 
requires fewer preliminary computations 


Ty condition (5) may be used to find z 


(21) Ra* = заж 
апа 

Rs = «TD? . 
A trial vector, s), e.g., (1, 1, - 
Tat). The subsidiary condition (5 


om either of these methods are 
on (6). Thus, if Sa and z, are the 
Scale by 


з = s, ο/δ.Ρ,.οῃ 
2 = z, Ve/(z,Bz2) ; 


where c is any convenient constant, eg. (n.. — q). 
If the second iterative method is used, R? may be obtained from 


т _ πο 
Bx’ 

In the matrix B = (D, — FD; 

this is the case a rapid and reasona 

by substituting Dr! for p*-! 


TY, the term D, may be dominant. If 
‹ bly aceurate solution may be obtained 
in the second iterative method. 

Numerical Example 


The preferences of 245 enlisted personnel of the 


Д United States Army 
were obtained for a set of menu items by 


means of a nine category hedonic 


R. DARRELL BOCK 237 
scale [13]. The frequency table of ratings for twelve of these foods is shown 
in Table 1. The body of Table 1 makes up the matrix F, the column totals 


TABLE 1 


Preferences for Twelve Foods Obtained by the Method of Successive Categories 


Categories Foods Total 


9 4 6 31 4 7 22 8% 3 28 1 9 12 139 
8 13 30 77 11 56 38 85 25 70 15 46 46 512 
7 32 72 975 3 "i2 οἱ ча ы πο з; 71 92 778 
6 37 62 40 32 60 76 33 74 4 40 42 61 598 
5 47 27 16 47 27 28 20 36 24 35 23 24 354 
4 29 26 7 4% 11 9 T 23 1 AS "i5 3 249 
3 26 10 2 31 6 7 2 Jl di 53 u 8 149 
2 31 6 1 30 5 e 1 9 0 30 16 1 132 
1 24 15 4 17 4 0 0 8 2 19 21 1 115 
Total 243 254 253 250 254 253 254 253 252 253 253 254 3026 


the matrix D, , and the row totals the matrix D, . Frequencies in Table 1 
were divided by their column totals and cumulated upwards. Normal deviates 
to the interval marks were estimated by the formula 


Zu = (Yan: — 9)/Qui/n.j), 


where у; and 0-1); are ordinates of the normal curve for the cumulative 
frequencies of the k and (k — 1)th category, for the jth menu item, the extreme 
ordinates being taken as zero. 

From the original frequencies and the normal deviates the matrices 
B, Т, and D,, were formed. Nearly stationary values for z and з were computed 
by the second iterative method and brought to scale by (6). Mean affective 
values for the foods were computed by (15). The resulting values, together 
with approximations obtained by setting B = D, , are shown in Table 2. 
Also shown are the corresponding estimates computed by an iterative version 
of Gulliksen's method due to Diederich, Messick, and Tucker [4]. The weights 
used to make the latter method appropriate for incomplete data were zero 
for the three unused cells (Table 1) and unity elsewhere. All the resulting 
vectors have been brought to the same location and scale to facilitate their 
comparison. 

'The regression line for food 3 has been plotted from the values of ma 
and s, from Table 2 for the exact solution. The corresponding regression line 
for Gulliksen's solution is shown for comparison. Normal deviates to the 
interval marks and boundaries are also plotted (Fig. 1). 

Food 3 was chosen because the preference for it scaled poorly. Note 
that the regression line for Gulliksen's solution tips away from the deviates 
Which represent the upper, much used, categories and toward the deviates 


238 PSYCHOMETRIKA 


TABLE 2 


Comparison of Successive Intervals Solutions 


Gulliksen's Alternative Solution 
Solution Exact Approximate 


—— O 


Interval Scale Values 


πα 2.373 2.537 
8 1.916 1.320 1.326 
7 1912 432 
6 .105 -.281 
5 $5580 -.682 
4 med -.985 
3 2d -1.282 
2 -1.651 -1.634 
1 “2:012 -2.660 

Discriminal Dispersions 
Foods 
1 1.152 1.103 1.080 
2 1.067 1.081 1.079 
3 1.122 1.058 1.110 
4 1.074 1.021 .984 
5 «970 992 992 
6 748 STD «768 
7 950 989. 1.033 
8 941 956 933 
9 1.045 1.057 1.094 
10 1.002 1.017 “978. 
11 1.253 1.266 1.275 
12 881 907 918 
Mean Affective Values 
Foods 
1 -.756 -.765 -.131 | 
2 -.157 -.139 -.149 f 
3 01 «668 «669 
4 -л22 -лто -.721 
5 «161 «186 «176 
6 «127 «157 2131 
7 81 4750 4755 
8 -.200 -.188 -.196 
9 «602 «565 .568 
10 -716 -.788 -.143 
11 -.142 -.089 -.089 
12 .341 «307 


of the lower, little used, categories. The alternative solution necessarily fits 
closely the deviates for the much used upper categories and tips away some- 
what from the two lowest categories. 


Discussion 


The successive intervals solution Proposed in this 


note їз currently 
being used in connection with models for predicting con 


sumer choice and 


R. DARRELL BOCK 239 


purchase (Boek [3] and Jones [12]). Because predictions based on these 
models are sensitive to small differences in the estimated mean affective 
values of consumer objects, it is important that the method of scaling the 
preferences yield efficient estimates of these values. The proposed solution, 
which leads to the conventional mean once the scale values for the categories 
are chosen, has this property. The models also require estimates of the 
correlation of preferences between objects, and it is convenient to have à 
solution in terms of interval mid-points which can be applied to the compu- 
tation of the correlations by the usual methods for grouped data. 


Gulliksen's 
solution 
ma42.707 
55 21122 м 


Alternotive 
solution 


wn 
ω 
2 
E 
< 
> 
Cu 
a 
< 
o 
ә 
> 
Œ 
o 
° 
ш 
= 
< 
o 


м 
Interval Marks (xk) 


=2 -ι ο 1 


NORMAL DEVIATES TO INTERVAL MARKS AND BOUNDARIES 


FIGURE 1 


Comparison of Regression Lines for Food 3 


240 PSYCHOMETRIKA 


this property has been proposed by Fisher 


aw of comparative judgment applied to a prob- 
lem in prediction of choice. Amer, Psychol., 1956, 11, 442-443. (Abstract) 


[4] Diederich, G. W., Messick, S. J., and Tucker, L. R. A general least squares solution 
. for successive intervals. Psychometrika, 1957, 22, 159-173. 
[5] Finney, D. J. Probit analysi 


[6] Fisher, R. A. Statistical me 
[7] Guilford, J. P. Psychometric m 
[8] Gulliksen, Н. A least s 


construction. In P. Horst, et al., The predi personal adjustment, New York: 
Social Science Research Council, 1941, 319-438. 
[10] Horst, P. A solution i 


ychometrika, 1936, 1, 
[12] Jones, L. V. Prediction of consumer 


[13] Jones, L. V., Peryam, D. R., and T! 


purchase, Amer, Psychol., 1956, 11, 443. (Abstract) 
ing soldiers’ food preferences, Food 


hurstone, L.L. Development of a scale for measur- 
Research, 1955, 20, 512-520, 


Manuscript received 4/9/56 
Revised manuscript received 2/23/57 


SS 


— 


PSYCHOMETRIKA—VOL, 22, NO. 3 
SEPTEMBER, 1957 


COMMUNALITY OF A VARIABLE: 
FORMULATION BY CLUSTER ANALYSIS 


Бовевт C. Trron* 
UNIVERSITY OF CALIFORNIA 


The communality of a variable represents the degree of its generality 
across n — 1 behaviors. Domain-sampling principles provide a fundamental 
conception and definition of the communality. This definition may be alter- 
natively stated in eight different ways. Three definitions lead to рне for- 
mulas that. determine the true value of the communality: (i) from. the k 
necessary and sufficient dimensions derived by iterated factoring, (ii) from 
the n — 1 remaining variable-domains, and (111) from k’ multiple clusters of 
the n variables. Seven definitions provide approximation formulas: (7) one 
from the k dimensions as initially factored, (11) one from the n — 1 remaining 
variables, and (111) five from a single cluster. Rank of the matrix is not a 
desiratum in some definitions. Using an example designed by Guilfordfto 
illustrate multiple-factor analysis, applications of the formulas based on the 
three precise definitions recover the true communalities, and five approxima- 
tion formulas each gives values closer than the ad hoc estimates usually 
employed in factor analysis. 


Some characteristics of individuals appear to have greater generality 
than others. By generality is meant the degree to which variation in a given 
behavior is also revealed in other behaviors—the degree to which it may be 
predicted from the others. One objective measure of its generality is its 
communality A*. The communality is an important companion statistic to 
the reliability coefficient of the variable. Just as the reliability coefficient 
Bives the degree of generality of a particular measurement across strictly 
comparable test-samples of the same behavior property, the communality 
of the measure gives its generality across different behavior properties. 

The rise of factor analysis has introduced narrowness in the under- 
standing and computation of both reliability and communality [10]. The 
first statistic is discussed in [11]; here communality is defined in terms of e 
Operations an investigator actually goes through in taking n measures O 
objects by sampling methods. Following from such a definition one may 
derive formulas for the exact value of °. The simple algebra of communality 
follows directly from the doctrines of behavior domain-sampling. б 

As an illustration of ће formulations presented here, Eo Wwe 
results of an application to an artificial problem designed by Sd πα 
Presentation of factor analysis ([2], p. 478ff.). In this 10-variable pro 


š —— i Table 1. 
the true communality of each variable is given 1n the first εν v: iE 
i . Kaiser 
.. "The writer wishes to express his indebtedness to C. F. Wrigley an 
their many helpful constructive criticisms. 


241 


249 ~ PSYCHOMETRIKA 


On domain-sampling principles, the communality of a variable may be com- 
puted according to the way the investigator designs his analysis. l'hus in 
Table 1 the value may be computed (7) by factoring, (11) directly from the 
n — 1 variables, (iii) from a single cluster of variables, (20) or from multiple 
clusters. Some formulations give true values (rows 2, а, 12» others. are 
approximations (rows 3, 4, 5, 7, 8, 9, 10, 11, 13). The closeness of fit is given 
by the mean absolute deviation of computed from true values, last column. 
For comparative purposes, Table 1 includes results obtained by factor 
analysis (see row 4, and the ad hoc estimates, rows 14 and 15). 


Fundamental Definition of Communality 
In cluster analysis 


, computing the value of the communality A? of any 
fallible variable a, am 


ong ^ intercorrelated fallible variables a, b, ... 1 Tt, 
puting formulas that follows from alternative 
© communality. For the given set of n variables 


correct value of A? in the population. Consider 
on of communality. 


there is, of course, only one 
here the fundamenta] definiti 


Definition 1: ^, the correlation with a congruent parallel variable 
Under the principles of domain-sampling, 


one definition of h2 is 
(1) hî = r 


ag! = laati = 99.9 = V dán А 


struct variable from an infinite s 
` > Gs , all of which would have perfe 
served n — 1 variables. Speci 


et of parallel 
ctly congruent 


fically, perfect 
congruence means 


(2) Tai S Τα = Pyri = "те жаы» Lo -- my, 


Definition 2: h?, the predictable variance 


from the variable-domain, 
The observed variable a ma 


y be considered as one sample measure from 
à composite score C, on a domain of such parallel variables, namely, 

(3) πμ. „ўз, 

C, may be called the cluster domain score of v 


for the correlation of sums and using (2), th 
a with its domain score GC. 18 


ariable a. By the general formula 
e correlation of observed variable 


Tata = [1 + (Ne de Dra: ]/ Neo + Neo( Nix == Dr. . 


Dividing numerator and denominator by n. , and using (1), 


(4) Tac, = h, . 


The magnitude h, is termed the cluster 


iio domain validity of variable a. It 
reveals how well the observed variation a 


mong objects in variable a matches 


м, μμ --υΕν'' Е 


—N. m" —$—_ 0 


> ы 


πιω... 


"(от) '9 fe IIo £6 «ῃ “€ ёт 1110 tot fg ‘L «ς 10 :gzəymTo p 


e^ , "IeSTEX "Ἢ 'Xq Λα uorqnTos αθηηᾶποο οταοσφοςτῃ ο 
ci *вәтаетхел әоцәләјәл OMF UTM G *xoiddy q 
> (927 'd “og οτας] * [z] ) POJEMO Kg Š 
— r sss ο ο αν 
£go* 99° UM ῃ)," 2 Ау 89° 32 ° 5), tl’ $9* G9* Өт q9002TH : 
«οι On Ob" 5 ο ο ο απο o0" ο εδ Fz qsousTH Ῥοτστρομ us 
oSo*| 19° ο. τ)" 19° «9: ιο “ο O 65° 66° pP7e39nto ur Ред wean Ст 
S$93WUT4S4 OOH DV 
Too*| 625° T59’ т” Оң, 6€L° ορ оқ" 6π 609° 006: Uor39193T UF - |, a “QEL 
900° | 195* 69° 228° 829° σι 989° 9° T° TI9° zos’ T9T3TUT - SiojsSnTO X “ποτ 
$Xe4sn[) οτάτητηγ шолу 
" zO’ 16° 99° 29° το op 69° €9° "O0" 6’ ος" | Io mox; ‘pend οτᾶπτς :я *xoiddy “ττ 
- ες] +° 99: әв" 19" 9° 69 ε nge 6" ος (αν στας) 1ο :01 *xoaddy «apr 
Z eto* LS* ς9" 9)," 5ο 69° g9° £9* ° LS’ ης" (queqsuog) "l5 :lq *xoxddy «ποτ 
S T20'| τς. 59° eg" το gle 9° 5ο 197 ο’ 06: samaa ἴῃ'ητατ uo *qTƏN 109 *xozddy «6 
> τΖ0' 16° σο" eg" το” 9," 09" 5ο" πρ + ° ος” '19ἠθττο *42uo) *xoiddy :g 'χοαᾶᾶν ° 
G рХәзептд serZupg е шод 
и πο GE 685 πε εἰς" τό eco εἰς: τι’ εἰς. 16η" Etc LT +), 
E £0000000* 5 " " " " " " uor39J031T πηρτ- H "n ° 99 
© OFS  α — mn а M Uor391e3T U9 - κκ ы 5 
e 100000’ = ƏnT%A anag} шолу •ләр 91πτοβας Tea UOT4181031 U39 = i " “99 
т000° | TOgS* т059° EOE’ гоо" τοη) 9ορο тоңо, τος TOT9* 000S° ποτηθιοητπηῃ-. κα à “πι 
£00" | EG” 059° +29" 919° Επι’ 999° о” τε) ττο 005: пот+вләтт pug - ο Ρππο ‘Tuus 59 
120° | 709° 959° σε. әгә, 691° 959° Egor σε. 886° τος" "pend '[nupg лој q IBA T9T-L — 'G 
зәтавүхел T-U шолу 
Ooo") του 4160" ο ον ος ο (ο. το το G οΏτοσγαθο τή 'xodddy "η 
£00* 2с * G9* ΕΘ’ €9° Hl’ 69* πο” Gg* το. ος" 000 "qmr τ 'xoaddy °€ 
000° | 095° 059° 029° οι о, 089° от’ 059° οτο 005: Ῥοπη5 2000 ο 
σπτ1ο1558.1 Ag 
095" 069° 028" оң” о" 069° оқә" 058° отә" οος' ΕΞΩΤΘΑ MIL “τ 
p OT 6 ° L ° ὦ £ Z T GuOoTqS8TTuLIOS 
πθογί STQBT IBA 


(məTqozd рот) 
SUOTRETNMIOT ΞΠΟΤΙΘΛ Ag Α1Ττοπαπποῦ 
T SISVUL 


944 PSYCHOMETRIKA 


that in a perfect cluster domain measure of C, . For a geometrie model of 
these relations, see [12]. ` 1 | : 

The Ming of the fallible scores predicted from the domain scores is 
the square of the correlation between them, i.e., from (4) 


(5) hê = rîca, 
which is the second definition of the communality. | 
The two definitions of the communality in (1) and (5) have certain 
restrictions: А 
() The parallel construct variables а, a’, '' , Gs must, by operational 


definition, measure different behavior domains, as stated in the first paragraph 
of this article. Variables that are sample measu 
domain are the elements essential to ca 


general restriction that the 
pose C, need only rev. 


(26). 

Note that (1) and (5) provide 
munality quite independent of the 
the “rank’’) necessa; 
variable a. 


two meaningful definitions of the com- 
number of dimensions (or "factors," or 
ту and sufficient to reproduce the n — 1 correlations of 


Communality from k I ndependent Dimensions 


Definition 3: h, the sum of partial communalities 


(6) Taos = Brosse = У) Beta, G=1,2,... , BD. 
Since the predictors are uncorrelated, then from (5) 


G) = № = ас, + τῆς, ΠΕ on Fhior 


where the successive r?, terms are called the par 


ἃ А tial communalities (squared 
coordinates), Àj, , А, 


aay tt, s, respectively. These are the predictable 
variances of the objects’ observed scores on a from the k independent di- 


mensions. Factorists call these variances the Squared “factor loadings.” 
Here is a third definition of h”, based on the factoring procedure, 


eal proportional correlations 


ROBERT C. TRYON 245 


Definition 4: h°, the squared multiple R with the independent dimensions 


The factoring process perforce yields the uncorrelated dimensions 
C, , Ca, ++- Οι. In (6) for such independent predictors, 


Bic; = Taci » 


whence 
(8) h = Blo, + βίο, + +++ + Bice ; 
(9) fit = Р À 


Thus, definition 4, namely, the communality is the squared multiple E 
between variable a and the k independent dimensions necessary and sufficient 
to yield vanishing residual correlations. 


Communality from the n — 1 Variables 


Definition 5: 12, the squared multiple R with the n — 1 remaining variable- 
domains 


_ Each of the independent cluster domains С, Ca , +++ , C, is a mere 
linear composite of variable-domain scores ([12], formula 8), 


ao о, = 0, + б,+ +6... 


By excluding any one of the n variables, the k dimensions are not reduced 
to k — 1. For example, C, cannot alone be responsible for any one dimension; 
it must share with at least one other variable (and usually more) some 
Predictable variance from the dimension, otherwise the dimension would 


not be required. Therefore, C, , C. , ::: , Cn may replace Οι, Co, **' , Ce 
1n (9), whence 3 
а) hè = Bioceec 


variable represents the 


Definition 5 implies that the communality of a ρ š 
being its variance 


degree of its generality across the n — 1 other variables, 

Predictable from the remaining variable-domains. 

Evaluation of Rescue 
Evaluation of А in (11) will provide, along with (7), а US ps 

computing formula for the communality, in this case without (C kt 

Writing the predicted à of (11) in the form of a regression equation 

the С?з: 


(12) а= У) bože- 


Here and below, let i, j = 0,0, ° , nandi “7: ph 
Then, by the formula for multiple & in terms of B's, 


(13) М = Жи годсо» = > Bac ae a 


246 PSYCHOMETRIKA 


In the numerator of (13) the т term equals the observed ra; augmented on 
4, 1.Θ., 


(14) Tace = Tei/h; . 


To develop the denominator term of (13), cg, note that the z in (12) is the 
standard score of C; . Noting that C; is a sum of parallel variables, z, Z, 
‘+++ ,¢,, as in (3), then (12) may be rewritten as 


(15) à = У) (βισι/σοίει + ze +- + Bina) 


Since the z scores here are those of the parallel variables symmetrical with 
(3), they have defined statistically parallel properties symmetrical with 
(1) and (2). 

The square root of the variance of (15) is needed in (13). This expression 
is the sum of all the elements of a covariance matrix which has as rows and 
columns the n. terms in z, , the n. terms іп z, , and so on. This grand matrix 
is composed of submatrices on the leading diagonal involving the r;;: sets 
such as ть, , and the side submatrices involving the ту; sets such as Ταν . In 
the development below recall that r;;, = hš , by definition, as in (1), hence 


(16) бб, = No + Nela — Whi. 


In the ith diagonal submatrix the sum of its own principal diagonal 
becomes, in the limit (as the result of dividing numerator and denominator 
by πο), 


Q7): πωβᾶοι/σο, = Во + (ας — 1)hi] = 0. 
In this submatrix, the sum of its side elements becomes in the limit 
(8) n. (n, — 1)Bechi/oe, = βῖοι . 


In thezjth side submatrix, of which there are two by symmetry in the 
grand covariance matrix, the sum of their elements is, in the limit, 


(19) 2n2(B.c./oc)(Bsc;/oc)ri; = 2(B.c:/hi)(Bacs/hi)ris . 
Summing the terms (17), (18), and (19) over the grand matrix, 
(20) σὲ = Σ) Во, + 2 2 (8.c./h;) (Bic; / hirs; . 


Finally, substituting (14) in the numerator of (13), and the square root of 
(20) in its denominator, gives by squaring the whole the communality, 


2 p [> (B.c,/h)r..]? 
ha = Rioc.-ec. = 2 š 
и | Σ В.с, + 2 У (В...) (В.с, ' 


(i,j = b,c, +++ m) and (i <). 


ROBERT C. TRYON 247 


Evaluating the expression on the extreme right of (21) would give an 
exact solution to h? . There are n — 1 unknowns he, hè, «++, ha in this equa- 
tion. The β”5 involve only terms in Το; (which are known) and in then — 1 
values of the unknown h's. Since the J's are in ñrst and second powers, the 
equation is quadratic. Recalling that there are n equations (21) to be solved 
simultaneously, formula (21) may be called the simultaneous quadratic formula 
for the communality. The positive roots of ће n quadraties are the desired 
h values. 


Iterative solution οἱ R.c,c.---c« 


By electronic computer the n simultaneous quadratics may be solved by 
inserting initial trial values of the h’s, solving the quadraties for new h values, 
and continuing the iterative process to convergence. Speed of convergence 
depends on the choice of trial values. The writer recommends that trial 
values be those known on domain-sampling principles to be close to the 
correct values. As will be shown later, approximation B provides such close 
values. Because this solution is a critical one in this paper, the steps involved 
in an eleetronie program are listed at this point: 


(i) Selection of a reference cluster of congruent variables for variable a: 

(a) Caleulate the degree of congruence of the correlation profile of 
variable a with the profiles of each of these variables. An objective measure 
of congruence with such à variable, call it v, is the index of proportionality 
of the n — 2 corresponding ra; and rs: values (see [12], formula 6). . 

(b) Select as two reference variables of a the two with the highest indices, 
and compute А? by approximation B, formula (29). В 

The ten trial values for the Guilford problem by this approximation are 
listed in Table 1, row 5. 


(ii) Iteration for h? by the simultaneous quadratic formula (21): 

(a) One can now set up a correlation matrix with trial values of all the 
coefficients necessary to solve (21). One row is for variable а and its correla- 
tions with the domains Съ, -++ , C, . In this row, Tac. = Tai/ħi - The remaining 
n — 1 rows and columns include the trial augmented correlations between 
the n — 1 predictor variable-domains, i.e. raf hihi + 3 ibed 

(b) Compute (21) for each of the n variables from the matrix descri e 
in (a) just above, thus securing the first iterated value of the communalities. 

(c) With these new values of h for all the variables, set up again new 
augmented matrices as described in (a) above. А eee ο 

(d, etc.) Recompute the h values by (21) and continue the proc 
convergence. : 


4 es 
I am indebted to Dr. Henry Kaiser for programing ш po go 
for the Guilford problem, using electronic computer ІВМ Type 9" 


š ; iri inutes. 
results are given in Table 1, rows 6a to 6e. Each iteration required 5 mim 


248 PSYCHOMETRIKA 


Note the speed of convergence fo the true values. By the 10th iteration, the 
mean deviation from true values is 3 one hundred millionths. 

Dr. Henry Kaiser has tried upper and lower bounds of k? as trial values 
in (21) in a number of artificial and empirical matrices. His results appear 
promising but convergence by these approaches seems at the present writing 
to be erratic [5]. 


Approximation A: h? from the squared multiple R with the n — 1 variables 


Suppose the impossible: that one had before him a matrix of correlations 
among an indefinitely large number of variables. From (9) the scores of the 
objects on the independent cluster domains C, ‚ C2, ++- , C, would be mere 
linear composites of the n., variables. But since hi is the squared multiple 
R between variable a and the beta-weighted scores on the C's, it would equal 
the squared multiple R between a and beta-weighted scores on the n. variables 
that compose them, that is, 


(22) = 


° 
abes ne = ha . 


In practice one deals with a finite n. If one now conceptualizes the 
finite variables as a representative sampling of » variables drawn from an 
infinite domain of n. variables having the same general pattern of correlations 
with a and with each other as do the actual n — 1 variables, then, on the 
theorem that the multiple R increases as one adds similar kinds of predictors, 


(23) D necu 5 Tos ee = hz x 


The squared multiple R between variable a and the n — 1 other variables 
is thus a lower limit of the communality. Though this relation is already 
known [3], its simple logic and proof on domain-sampling principles is of 
interest. If the betas are significant on a reasonable number of the n variables, 
then it seems likely that the value of the squared multiple R approaches the 
value of the communality. The lower limits of the communalities by (23) 
for the ten variables of our illustration appear in Table 1, row 7. 


Communality from a Single Cluster of Variables 


Above, A; is expressed as a function of all n variables, hereafter as a 
function of a cluster grouping of the n variables, First, develop the correlation 
between variable a and a domain score on a cluster of variables that includes 
a. This domain score is defined as 


(24) C= Lat Dea + Σε 
(24a) C, FOF = 4s б, 


where s is the number of variables in the cluster, and У), = C, = £s 
Zar +++ + Zne as defined in (3), the a, a’, --- , ana parallel variables perfectly 


ҮЕ 


^w 


ig πμ ————— 


5-3 


ROBERT C. TRYON 249 


congruent as defined in (2), having equal communalities, raar = ha as defined 
in (1), and analogously for the other summation terms of (24). 
The correlation between variable a, the first term within dz. , and 
the composite of (24) becomes by the correlation of sums in the limit 
δρ = бш da Е a) СЧ E н Tae 
аСт = 2 
(23) De eve +2 2278 53 h; + 2 >r i 
(i,j =a--+s;i < 1). 
Here is a basic formula of cluster analysis and, for that matter, of factor 
analysis. When s = δι, it is the partial communality, h?, in the Key Cluster 
Method of οσο analysis ([12], formulas 9 and 19). When s = n, it is the 
squared centroid factor loading in Thurstone factor analysis. As seen below 
it provides the exact value of the communality of a variable under certain 
specified conditions that are, however, difficult to meet in practice. 


Definition 6: h°, the correlation with a cluster domain of congruent (collinear) 
variables 

Take the case in which the variables that compose C; have congruent 
profiles of their columns of r in the correlation matrix, though not necessarily 
at the same level of correlation. (The term “congruent” has a geometrically 
equivalent expression, “collinear,” referring to the fact that such variables 
lie on the same straight line from the origin, i.e., on the same vector in n OT 
k space.) The precise definition of congruence (or collinearity) is 


2 
(26) Tai/ To: = а constant; r,/r.,, = а constant; *** ; 
Tara = ἃ constant; (i = the remaining n — 2 variables). 


Congruent variables are also ealled “equiproportional,” and their submatrix 
of intercorrelations is called “hierarchical,” or having a “тапк” of one. 
For the case of two variables, a and b, introducing their correlations 


with their respective parallel variables into (26), noting (1) and (2), gives 
Tao! Та? = Tov’ /Τον’ = h/Tau = ra flu; , 
or 
(27) та = hh, , that is, in general, Tai = halt: - 
bles, i.e. 


Now write the special case of rac, by (25) for s congruent varia 
Substitute (27) in (25), 
rig, = [hš + h (a d ΙΙ t h) G jaar it < j). 
1 Σ h + 2 > hih: 
Multiplying out the numerator, then taking out he, ^ 
term will reduce to (2 8)”, which in turn after expansion 
alent of the denominator. In short, 


the resulting parenthesis 
becomes the equiv- 


250 PSYCHOMETRIKA 


(28) Fok nd NYC, № + 2 2j hh; = h: 
Gri = ag 1857 < 0). 


Here, then, is definition 6 of the communality of variable a, namely, its 
squared correlation with a domain score of s congruent variables. 
To evaluate (28), one uses (27) to obtain 


ce = М D hih;/ Σ) hh; = Σ, (hahd(hah;)/ Z hih; 
= nf GI = bpe RES 0. 


'This formula was used by Spearman to measure his g saturation of variable 
α ((6], appendix, formula 20). On principles of domain-sampling, Spearman's 
g was thus simply the composite domain defined by (24) composed of con- 
gruent variables. But (29) is awkward to use in an actual matrix; hence its 
equivalent, also developed by Spearman ([6], appendix, formula 21), is used: 


(30) B= (ond? Drai = b, e [Dra] = a, st AD 
— 2 3 n =b, ,8]. 


Approximation В: h° from squared r with an approximately congruent cluster 
domain 


(29) 


In practice one rarely finds strictly congruent variables. But one can 
always group the n variables systematically into as many approximately 
congruent clusters as possible (see above in evaluating formula (21), also 
[9], appendix B, and [12]). The approximate communality of each variable 
can then be computed by (30), or (29). In the Guilford problem, Table 1, 
row 8, the fit to the true values is shown by an average absolute deviation 
of .021. 


Approximation C: h? from converged squared r with an approximately congruent 
cluster domain 


In this approximation the general formula (25) is used. If values for M 
were at the right, there would be a solution. Note that zf the correct values 
of the A? terms were known and ¿f C; were the variable-domain, C, , then the 
general formula (25) would read 


(3 +r + tte)? ,4. А 
(31) № = Thee = тс = Tul Y. ) G, j = a, == sj < 0. 


Το get an approximate solution put trial values for the h° terms on the right— 
values that are known on domain-sampling grounds to be close to the correct 
values. Then iterate on the h? values until convergence, that is, until the ha 
on the right of (31) equals the A; on the left. The solution will not in general 


give the correct values because Tac. 7< Тасу but will approximate the correct 


— - 


m 


а ka sa 2 


ROBERT С. TRYON d 251 


values according as the variables that compose Ci approach congruences, à 
condition which, if met, would result in the identities of (31). 

The first step, then, is to get good trial values for the h? terms on the 
right of (31). To do so define the cluster domain score of C, only slightly 
differently from that defined in (24)—let it be a composite score of an in- 
definitely large sample of variables representative of a, b, *** , 8, that is, 


(32) C. = 2 Ба 1- --- beet -- ὅκα - 
Representativeness of the fully extended set of ne variables means that they 
have the same average correlation properties as the observed set of s variables, 
(33) Fa: for the full set b, --- , ne = Fa; for the observed set b, ++ , S, 

and similarly for Tor , Fas , `" 7,; ; also 

(33а) F; for the full set a, ۰< , n. = F; for the observed set a, *** , 8, ( < 2): 
The virtual identity of definitions of (32) and (24) should result in тшс, = 
rac, . From the correlation of sums 

ος Lt Me = Wier 6-555.) 

М + поо — DF; ја Met < 0. 
Taking the limit, ie., dividing numerator and denominator by πω , sub- 
stituting (33) and (33a), and squaring, 

(84) Rene = FRG = b, hi, i =a, 8 < ϑ. 
There are no unknowns here. The approximation of ha from (34) is the 
simplest to compute, for in the submatrix of intercorrelations between the 


variables a, --- , s, it requires only the mean r in the row of variable a and 
the mean r over the whole submatrix. Call approximation (34): 


h, = аСт, 


Approximation D, : h? from squared т with an approximately congr uent repre- 
sentative constant cluster domain 
Note that in the Guilford problem, Table 1, row 10a, a ο 
D, is not as aceurate as Approximation B, its absolute mean eu i 
from the true values being .032, compared with .021. 


Approximation D, : h? from squared r with an approximately congruent repre- 


sentative shifting cluster domain 


D, is 

An approximation that gives results closer to T S med roin 

Secured by a slight change in definition of the domain ni RE am being 
in (32). In this revised definition C;, excludes the variable w 


computed, e.g., 
for h2 : Οι, 
for М : Cr 


ὃν Де тте Че E ET Beams 
z og οσα 


1 


o 


`w... 4 


252 | 1  PSYCHOMETRIKA 


and so on. Thus, for the calculation of the h? of each variable in the cluster, 
the cluster domain "shifts" by virtue of excluding the given variable from 
it. Definition (33) remains unaltered in this case, but in definition (33a) 
both the full and the observed sets begin with b (a being excluded). In the 
limit, then, approximation D, takes the same general form as (34), that is 


hi = rig, = same as (84), except that in the denominator 
(342) i j, = b, ---,8; ç <j. 


In Table 1, row 10b, note that the communalities by D, show a mean 
absolute deviation of .023, nearly as good as those by B, and better than those 
by Dı. 


Iteration for approximation C 


Inserting the values by D, or D, as the initial trial values in (31), one 
can iterate to convergence. The successive values form a geometric progression, 
so one can take the limit of such a progression as the final converged value. 
With the values of D, , as trial values, the converged limits, given in Table 
1, row 9, are seen to be equivalent to those by approximation B. 


Approximation E: h^ from a simple quadratic squared r with an approximately 
congruent representative cluster domain 


"The cluster domain C, can be defined in still another slightly different 
way—as a composite of variables representative of the observed set a, b, 
:. , 8, but with the observed set deleted, as follows: 


(35) б, = Za F гь P e eae P +++ 4-а. 


The variables a’, b’, - - - , 5’ are, respectively, defined as parallel to the observed 
a,b, «++ , sas stated in (1) and (2), and the m, extended set have the average 
statistical correlation properties as defined in (33) and (33a). Once again the 
virtual identity of the domain in (35) and that in (24) should yield Rice = 
Tac, . From the correlation of sums, taking the limit, and substituting (1) 
and the equivalents of (33) and (33a), 


a nol а рр Em 9 
(36) аСт,’ М (i, j -9, ,8;i < 2. 


There are only two unknowns here, that on the left, and A? on the right. 
Recall that the domain ΟΊ. , is an approximation to ΟἽ, and it in turn is an 
approximation to C, . Remember that C, would be C, if the s variables were 
congruent. That is, ha = Fac,’ - 

Thus set ha on the left of (36). The result is a simple quadratic in h. 


Ow 


ROBERT C. TRYON : 253 


Solving and squaring 
(37) E = {[ Vin & Vis — ¥ D/SEVQ/SE- 
(In fir f= В, + XE ШШ 


h? is the square of the positive root between zero and unity. 

Approximation E to the communality in (37) should be about as good on 
domain sampling principles as B and C. In the Guilford problem, Table 1, 
row 11, the mean absolute deviation from the true values is about the same 
as for B and Ο, namely, .024 as against .021. 


Ad hoc estimates of the communality 


None of the above approximations are ad hoc estimates. All are formulated 
squared correlations between variable a and a defined domain score of the 
objects—domain scores that are not exactly equivalent but nevertheless 
close to the domain score C, required in the basic definition of communality 
in (1), (2), and (3). Sheer ad hoc guesses are likely to be far off the mark. 
In this category fall the use of highest ra; or Burt’s modified highest r as 
recommended by centroid factorists ([1], p. 153 ff.). Note that in the Guilford 
problem, Table 1, rows 15 and 14, these two ad hoc estimates have absolute 
mean deviations from the true value of .083 and «007, respectively, about 
four times and three times as poor as approximations from B, C, and D. 

If an analyst wishes a quick ad hoc estimate he can do better than the 
above estimates by using one based on the following rationale: On the grounds 
that approximation D, by тє, and approximation E by тїсү„, should be 
roughly equivalent, let us therefore set the h, by (34) equal to that by (36) 
and solve for the one unknown, 2 . A little algebra will then show that 


(88) — 


To calculate this estimate one need only to discover, say, two or three reference 
variables with which variable a is most congruent, and compute its mean 
correlation with these reference variables. For the Guilford problem, Table 
1, row 13, this estimate has a mean absolute deviation of 050, much Eu 
than the other ad hoc estimates, but poorer than approximations B, G, 
D, and E. 


(Factoring) 


The communality of а variable can be found, as shown E E 
3rd and 4th definitions, by cumulating the partial pu Y 
independent cluster dimensions C, , C» , › C, secured by а jal cases 
procedure. Factoring can be performed in different ways, all spec 
of general cluster formulations. 


Communality From Successive Residual Clusters 


254 PSYCHOMETRIKA 


General principles οἱ factoring, or successive residual cluster analysis 


The factoring process, in general, follows this sequence: A first cluster 


C, , consisting of variables a, b, :- , δι is selected, whose domain scores аге 
defined as in (24). This domain score is 
(39) QÇ, = = >z; + Dat: + XE. 


The correlation of variable a with the domain score С, is then computed by 
the general formula for the correlation of a variable with a domain score 
given in (25). For any variable v that may not be included in C, , the general 
formula is the same as (25) excepting for the numerator, namely, 


(40) fa, = (У п) № T2 Ут), G, j = G, === ; E 


A second residual cluster is now selected composed of s, variables. Its 
domain score is ,C, (written simply as 60), defined as a composite of the 
> τα scores of its s, variables. The correlation of variable a and variable v 
with C, is computed by formulas identical with (25) and (40), respectively, 
with the exception that the Л” and r terms are residual communalities and 
correlations. The analyst continues to select additional successive clusters 
Сз, +++ , C, until all residual correlations vanish. 


Special cases (Centroid, Key Cluster, Square Root, Principal Component, 
and other methods) 


The various factoring methods are merely different special cases of this 
general formulation [13]. Notice in the general definition of a cluster domain 
score by (24) there are two sets of parameters. The first is s, the number of 
variables selected out of the total n. A second refers to the number of parallel 
variables in each У) z term. Each Σε, may be written as 


(41) Deis zi Hee + tay. 


For the domain score C, , n; equals n, within each term of (39). But it may 
be set at any finite number. Let us look at the parameters selected by the 
various factoring methods: 

In the Thurstone Centroid Method of factoring [1, 7], the analyst sets 
$ — nin (39), that is, C, is the omnibus cluster domain score consisting 
on all n variables. The formula for the squared centroid factor loading 
may be recognized as our general formulas (25) and (40) with s, — n. Anal- 
ogously, in this method one sets 5; = s, = , «+» , = n. For the second set of 
parameters, the centroid method sets n, = n, = , +++, = n, in (41). 

In the Key Cluster Method as developed by the writer the analyst selects 
as the successive residual clusters different groups of residual variables that 
are most independent of each other. Here, δι, s; , --- , s, refer to the different 
clusters and all are less than n. Here, also, n, = n, =, +++, = πω. The gains 


-32 


ROBERT C. TRYON 255 


in efficiency, a 
1 , accuracy, and struct; i 

cant У, ural analysis of {1 

ag iw i are developed elsewhere (10 12] нич 

һе ^ Жык 
ος sp em M ethod of factoring, each successive cluster consists 
ations, "etus τ 5 is e (that is, s — 1) that shows maximal residual сога 
Wrigle a s=, , = πο. The main pr i is 
упат ο in proponent of this m 
(0) relied =з E a also recommends this method for special ae 
oe ae xA ere are other factoring methods that specialize in settin, 
scores, Неге. n ependent dimensions C, , C; , +++ , Cx as fallible eae 
d › a = ‘= m = 1 3 š 

Camponenis {Ε T ^ , = 1. Notable is Hotelling's Method of Principal 
TERN I. E c " : so sets s = n as well as weights the scores of the 
that (25) reduces t an each residual composite. In this weighted case note 
Мы од in sehen ο putting the magnitude 1.00 in place of the μ᾽ value 
the residual clu н оа аге put іп the diagonal, as well as those that dina 
in the ERE i = a fashion as to require reliability coefficients 
кезем here: of h? values in the general formula, cannot be further 


D DNE 2 
elermining h? by factoring 

The procedur : 
factoring ra "ey of computing the communalities of the n variables by 
factoring is οὐδ trial values of h? in the general formula (25). After the 
боша ев ^ ete, new trial values of h? are cumulated from the partial 
second refactori y formula (7), these are plugged back into (25) for the 
vergence, B a new values obtained, and the process continued to con- 
in the Quilt i. ne Key Cluster Method [12] convergence is on the true values; 
centroid meth Lies the results are shown in Table 1, row 2. But in the 
and the ew as commonly used, the trial values taken are 50 inexact, 
ρήμα d posue i urge that most centroid analysts rarely 
value convergence. Guilfor does not, for ex iv n 

τ for his illustrative problem. i корр e 
of 15 ims practice in centroid factoring is to 
mation, then, z (T) as the result of the initial factoring. 


accept as final the values 
Another approxi- 


lpp tin / d 
roximation F: a nili 1 а, 
vn Г: h from inilial partial communalities (or “squ те factor 


loading в”) 


τμ. values in the Guilford problem by 
as those REY Table 1, row 4. Note that 
mean есен simply calculated from single cluster 
.020 € deviation from true values being 
"For or the single cluster methods about the same. 

by the x omparison, take the comparable values secured 
probl Key Cluster Method in CCC analysis. These và 
em, in Table 1, row 3, are almost exactly the true values, 


approximation F from centroid 
the values are about as good 
's (rows 8, 9, 10, 11), the 
for the centroid method 


from initial factoring 
lues for the Guilford 
their absolute 


256 PSYCHOMETRIKA 


mean deviation being only .003. The poor showing of the centroid method 
stems from its indiscriminate selection of all n variables for the successive 
residual clusters, as well as the practice of choosing inferior ad Aoc trial 
values, as illustrated in Table 1, rows 14 and 15. 


Communality From Multiple Clusters 


Another means of computing the communality of a variable stems from 
the definition of the communality as the predictable variance from the 
variable-domains organized by oblique clusters. Communality by definition 4 
is the predictable variance from variable-domains organized through factoring 


into k independent clusters C, , C, , --- , C, (9); in definition 5, the variable- 
domains are taken separately as predictors C, , C, ‚ 77+, С, (21). Now group 
them into oblique predictor cluster domains Cy, Cy, +++, Cy. 


Definition 7: h°, the squared multiple R with k' oblique cluster domains 


To form oblique cluster domains, C, , C, , +++ , C, are grouped into k’ 
clusters in which the variables are as congruent as possible. Let, for example, 


@ Cr=C +0 + “+O = Dat Date + Σε, 
5 Ci = σ, 1-6, He Gu = Dea Eaten + XXE 


and so on to the e,,th domain that exhausts the variables, the parallel variables 
in each Z term satisfying definitions (1), (2), and (3). Since Οι, C, ep C, 
are linear composites of the т variable-domains, then by the definitions 4 
and 5, 


(44) Τὰ == Bheowus - 


As with (12) and (13) the regression equation and multiple R? (ог h?) may be 
written 


(45) C.= a= У Boze, G-ILIL--,k); 


Bac faci]? ек x 
(e he E ES ل‎ à Gi 2 L п, vs Miis. 


One can evaluate (46) by a quick convergence process as follows: 

(i) Evaluation of т.с; . Compute тє, by the general cluster correlation 
formula (25), using approximation B for the communalities. Recall that for 
any variable v not in a given cluster C; , the same formula (25) applies ex- 
cepting that ha is not in the numerator, symmetrically with formula (40). 
With rac; and r,c, evaluated then the 8 terms of (46) may be solved. 

(ii) Evaluation of rc,c,, . The remaining terms to be evaluated are 
Tc. c, . Το be concrete, consider the case of Тстсүү » Where C, and Οτι are as 
defined in (42) and (43). Recalling definitions (1), (2), and (3), this correlation 


225. cmm 


3, 


= > 


ROBERT C. TRYON 257 


becomes by the correlation of sums, 


(47) 


"суст 


reson react’ 7 76,1: К 
ма (еса тсе М8и-Е2(©с„с„-Е2*°-РТсоц-л›бап 


To evaluate this expression, the augmented correlation, like то, с. , is needed. 
This augmented correlation is ([12], formula 21) 


(48) Taita = Po f huh + 


To secure all the rcc values of (47) would require setting up the covariance 
matrix of the variables of Οι and C; , augmenting the 7’s by the formula of 
type (48) using the А values from step (i) above and then inserting unities 
in the leading diagonal. Then the numerator of (47) is the sum of the terms 
of the side submatrix between the variables of Οι and Ci; , the term in the 
denominator under the first radical is the sum of the terms in the C; diagonal 
submatrix, that under the second radical the sum of terms in the Cr, diagonal 
submatrix. 

This work is unnecessarily laborious. An almost exact value of rcc; 
can be secured by defining C, and C, as representative domains Cr, аб 
defined in (32). In this case, by the correlation of sums 


49) Tercer > Τι n/ Vint V Tua. 


In the Guilford problem where the exact value is known Torcy is found by 
the rigorous formula (47) to be .306; by the easy formula (49) it is .307. 
(iči) Iteration to h? . The process is first to find the initial h” values by steps 
(2) and (11) above, then plug these values back into (46) оп the right, t. 
compute the h? values and continue until convergence of the /? values. A р 
Process is speedy, as will be seen below, where it is seen that even this wor! 
can be shortened. 
Definition 8: h^, the squared multiple R with k most independent oblique cluster 
domains 
To use all J clusters as predictors is ine RR 
and sufficient are k predictor clusters—the number k being tha 
factored the correlation matrix. 


fficient. All that are necessary 
t found if one 


> : trix that was con- 
This fact can be demonstrated from the ος e dge, the 


Structed artificially from two dimensions. Wit Н о = 8 clusters. 
Writer clustered the matrix, the ten variables ee js, whose variables 
From these he chose the k = 2 most independent, bl а were then computed 
correlated the lowest. The communalities of e B: s ha frs round gave the 
by (46), following the three steps outlined abov oie miss the true values 
initial communalities shown in Table 1, row А аа values is rapid; on 
9n the average by only .006. But convergence К 


258 PSYCHOMETRIKA 


the 4th iteration the values in row 12b show a mean absolute deviation less 
than .001. 

Since the analyst does not know the value of X ahead of time, a procedure 
that provides an efficient solution is needed. If for each variable, one orders 
the r's with the k’ clusters in order of magnitude, then опе can choose as 
predictor clusters only those that have significant correlations with the 
variable. Employing (47) for these predictor clusters should give close to the 
exact value of the variable's communality. 


Summary and Conclusions 


The communality A; of a variable a refers to its generality across n — 1 
other behaviors b, ο, --- , n. It is the variance of scores on a, predicted from 
the domain score C, , where C, is a construct composite of scores on a large 
number of different behaviors whose correlations with the n — 1 variables 
are proportional to those of variable a. It may thus be written 


(5), (30) he = тас, - 


Tt follows that the communality is (7) the correlation coefficient between 
Scores on a and scores on another construct behavior a' having exactly the 
same coefficients as а with the other n — 1 variables, i.e., 


(1) he = ra 

and (11) the variance of a predicted from the remaining n — 1 variable- 
domains C, , C, , +++ , C, , that is, it is the squared multiple R, 

(11) ha = Hioceec- 


In this form it may be calculated by an electronic computer from the simul- 
taneous quadratic formula (21), in which the set of h values of the n — 1 
variables secured from approximation B, formula (29), initiate the iterative 
process. 

But the domains C, , C, , --- , С, that predict the communalities of all 
n variables can always be grouped without loss of predictive power into k 
independent cluster or residual cluster dimensions (“factors”) Οἱ, €,., «»: 
C, , whence the predicted variance of a is the squared multiple R, 


(9) 15 «= oasis 

The computing form of (9), suitable for desk caleulator or electronic computer 
operations, requires cumulating the k partial communalities: 

(7) BE = toe Nate “ho ЕЛ = ας hd E. 


By the Key Cluster Method of factoring [12] the partial communalities are 
secured initially by approximation B, (29) or (30). After the first factoring 
the improved estimates of the communalities by (7) permit refactoring to 


, 


—— V< —À— e——ə>—s>—ə-. . ,À3 


ROBERT C. TRYON 259 


secure better estimates. Refactoring continues until convergence of the 
communalities. 

The predictor domains C, , C, , ++- , C, in (11) ean also, without loss of 
predictive power, be grouped into k’ oblique cluster domains, C; , Си, *** , 
C, , subject to the limiting conditions that k’ > k and that the k’ cluster 
domains lie in k space, that is, . 


(44) he, Bre auae + 


The computing form of (44) is given in (46), which is generally symmetrical 
with (21) but with fewer predictor variables. The practical limitation of this 
formulation is the need to know k. 

Approximations to the communality, useful especially as values to 
initiate iterations to the exact value, will approach this value to the degree 
that the approximation formulas employed meet the definition of the com- 
munality. By the definition expressed in (5) it follows that approximations 
B, C, D, , and E give quickly computed values close to the exact value 
according as the variables that comprise the reference cluster of the variable 
have correlations congruent with it. By approximation F, the cumulated 
partial communalities by (7) or (9) resulting from only the first factoring 
require more work than approximations B, C, D, and E; in the Guilford 
example, values secured by key cluster factoring are considerably more 
exact than by the centroid. Approximation A, the squared multiple R, is 
clearly a biased estimate of (11), the predictor variables being the n — 1 
fallible measures rather than the required n — 1 variable-domains. The 
highest r and modified highest r are poor estimates, since by definition (1) 
there is no implication that the required perfectly congruent parallel variable 
a’ is, or should be, well represented by that observed variable with which a 
correlates highest. 

The present account has been concerned with the logic and 
communalities. Three methods have been shown to give the exact values 
for an artificial matrix with known theoretical population communalities. 
We need to evaluate the three methods on empirical matrices. This problem 
is more difficult, because of lack of knowledge of the population values and 
of the need to take into account the sampling errors of the communalities. 
It would seem that to treat the communality as a squared modified R as 
given in (11) should simplify the problem of deriving this sampling error. 


algebra of 


REFERENCES 
[1] Cattell, R. B. Faetor analysis. New York: Harper, 1952. 

[2] Guilford, J. P. Psychometric methods. New York: McGraw-Hill, 1936. ts 
[8] Guttman, L. Multiple rectilinear prediction and the resolution into components. 
Psychometrika, 1940, 5, 75-99. ы 
[4] Hotelling, H. Analysis of a complex of statistical variables into principa. 

J. educ. Psychol., 1933, 24, 417-41, 498-520. 


] components. 


260 PSYCHOMETRIKA 


[5] Kaiser, H. Solution for the communalities: a preliminary report. Contract Report 5, 
27 Sept. 1956, Univ. Calif., Contract No. AF 41(657)-76. 

[6] Spearman, C. The abilities of man. London: Macmillan, 1927. 

[7] Thurstone, L. L. Multiple-factor analysis. Chicago: Univ. Chicago Press, 1947. 

[8] Tryon, R. C. Cluster analysis. Ann Arbor: Edwards Bros., 1939. 

[9] Tryon, R. C. Identification of social areas by cluster analysis. Univ. Calif. Publ. 
Psychol., 1955, 8, No. 1, 1-100. Berkeley: Univ. Calif. Press. 

[10] Tryon, R. C. General dimensions of individual differences: Cluster analysis vs. mul- 
tiple-factor analysis. Amer. Psychol., 1956, 11, 479. (Title) 

[11] Tryon, R. C. Reliability and domain validity: Reformulation and historical critique. 
Psych. Bull., 1957, 54, 229-249. 

[12] Tryon, R. C. Cumulative communality cluster (CCC) analysis. Contract Report 7, 
8 Nov. 1956. Univ. Calif. Contract No. AF 41(657)-76. 

[13] Tryon, R. C. Domain sampling formulation of cluster and factor analysis. Contract 
Report 18, July 1957, Univ. Calif. Contract No. AF 41(657)-76. 

[14] Wrigley, C. F., Cherry, C. N., Lee, M. C., and McQuitty, L. L. Use of the square-root 
method to identify factors in the job performance of aircraft mechanics. Psychol. 
Monogr., 1956, 70, No. 23. (Whole No. 430.) 

[15] Wrigley, C. F. An empirical comparison of various methods for the estimation of 
communalities. Contraet Report 1, 30 June 1956. Univ. Calif. Contract No. AF 
41(657)-76. 


Manuscript received 12/17/56 
Revised manuscript received 2/25/57 


PSYCHOMETRIKA—VOL. 22, NO. З 
SEPTEMBER, 1957 


AMOUNTS OF FIXATION AND DISCOVERY 


HERBERT A. SIMON 


CARNEGIE INSTITUTE OF TECHNOLOGY 


The proposed quantitative description of maze learning rests on the 
assumption that two independent processes are involved: (i) a discovery 
process based on trial-and-error search. for the correct response, (11) a fixation 
process equivalent to that observed in serial learning. The model leads to 
predictions that are consistent with the available experimental data. In 
partieular, the number of trials required for fixation is independent of the 


number of alternatives at each choice point (and hence independen 
number of bits of information contained in each correct response). 


To learn the correct path through a maze, a subject must first discover 
the path and then fixate it in memory. The distinction between the processes 
of discovery and fixation is well known in the psychological literature [4] 
but has not been much used for analyzing learning experiments quantitati 
In this paper, it is shown how amounts of discovery and fixation can be 
estimated from the structure of a maze learning task, how these amo 
can be used to prediet number of trials and number of errors to criterion 
and how an analysis of maze learning in these terms brings these experiments 
into relationship with classical experiments on the learning of lists of nonsense 


syllables. 


η 
IN MAZE LEARNING BEHAVIOR 
4 


The Theoretical Model 


To run-a maze successfully, the subject must make a correct sequence 
of responses, e.g., “left, left, up, right, left, up.” At each choice point, the 


response must be selected from a specified list of alterna 
» specified number оѓ choice points. Hence, а particular тале m 
- ized by two parameters: n, the number of alternative p& 
| responses, at each choice point; and L, the number of ch 
of the maze. As measures of learning only total error 
trials to criterion T' will be considered. 
| It is postulated that the learning behavior i 
one of discovery and one of fixation. It is convenien 
| reverse order. 
*[ am grateful to W. J. Brogden, С. A. Miller, R. F. Thompson, 
helpful comments on an earlier draft of this paper. 
261 


20. - 


tives. There is а 
ay be character- 
ths, or possible 
oice points, or length 
s to criterion E and 


nvolves two simple processes: 
t to discuss them in 


and J. Voss for their 


262 PSYCHOMETRIKA 


Fixation 

The learning may occur under either the correction or the non-correction 
method. Under the correction method, the subject makes a sequence of 
responses at each choice point until he makes the correct response; he is then 
informed that it was correct and proceeds to the next choice point. Under 
the non-correction method, if the subject’s first response is incorrect, he is 
told the correct response and then proceeds to the next choice point. Under 
both methods one correct response is reinforced on each trial at each choice 
point. When animals are used as subjects, and when the learning involves a 
physical maze, the correction method is ordinarily used. In experiments on 
human learning in verbal mazes, either the correction or the non-correction 
method may be used. 

Ignoring, for the moment, the activities of the subject in searching for 
the correct response (ї.е., the discovery activities) and the range of alternative 
responses open to him, it is seen that the task of fizating the correct response 
in a verbal maze learning experiment is the same as the task of fixating the 
correct response in learning a series of syllables or digits. Assuming that 
these two processes are, indeed, the same, one is justified in using the ex- 
perimental findings on serial learning to predict the course of the fixation 
process in maze learning. This assumption will be validated by testing the 
predictions to which it leads, and by relating it to other recent findings on 
memory processes. 

Available data on serial learning ([3], p. 620, Fig. 8) show, for sequences 
of the lengths employed in maze learning experiments (L no greater than 24), 
that the total number of trials to criterion increases monotonically with the 
length of the sequence: 


(1) T = f(D), with dT/dL > 0. 


T appears to increase proportionately with L; within the range con- 
sidered, the departure from proportionality is not large. Accepting the 
evidence for proportional increase, (1) may be specialized to the linear 
relation 


(1) P= Ы, Σο. 


Both the general and special forms of the function will be used in what 
follows. 

An initial hypothesis is that these functions, obtained from the empirical 
data on fixation in serial learning, are applicable to the fixation process in 
maze learning. It may be remarked that b depends both on the ability of the 
subjects and the difficulty of the material to belearned. For nonsense syllables 
with low association value, values of b in the neighborhood of 1 are often 
reported; it is remarkable that values from studies carried out at widely 
different times are quite similar ([3], р. 620, Fig. 8). Whether or not this 


HERBERT A. SIMON 263 


relative constancy carries over to learning in verbal mazes will be diseussed 
in a later section. 

Implicit in the hypothesis that the function (1) applies to the fixation 
process in maze learning is the very strong assumption that Τ is independent 
of n. It is not at all obvious that the difficulty in fixating the correct response 
ata choice point in a maze should be independent of the number of alternative 
responses at that point. It would be plausible to assume that the difficulty 
of fixation would inerease with the amount of information contained in a 
correct response. But, by definition, the amount of information in a response 
is proportional to the logarithm of the number of alternative responses— 
one bit for a choice between two alternatives, two bits for a choice between 
four alternatives, three for a choice between eight alternatives, and so on. 
1f difficulty of fixation depended on amount of information, 7 would depend 
on n. 

Regardless of the plausibility of the assumption that T is independent 
of n, it is made tacitly in the whole body of experimentation that has been 
carried out.on serial learning. The grounds for this assertion are these: à 
subject, learning a series of nonsense syllables or digits, is at each step trying 
to select the correct response from some range of possible responses. But 
(with the exception of a recent experiment mentioned below) it has not 
been usual for the experimenter to specify for the subject the range of possible 
or admissible responses. The particular nonsense syllables used are selected 
from a much larger class upon which the subject could presumably draw in 
trying to choose the correct response. The size of this class is a possible 
source of variance in the fixation process that has not generally been con- 
trolled in the classical experiments. In the learning of verbal mazes, the 
number of alternatives at each branch point is made explicit, and hence the 
independence of T' from n can be subjected to direct test. The experimental 
findings are consistent with the hypothesis of independence. 

The finding that the number of trials required to learn a sequence of 
syllables depends primarily on the number of syllables to be learned and not 
upon the number of bits of information per syllable is parallel with the data 
of Miller [5] on the span of immediate memory, recently corroborated directly 
by Miller and Smith [6] for rote memory. These results suggest the need of 
caution and sophistication in applying measurements from statistical in- 
formation theory to learning experiments. As Miller [6] has pointed out, 
information measurements appear to be directly applicable to certain ex- 
periments in perception and discrimination but not to memory span experi- 
ments. These findings provide additional justification for the absence of n 
from (1), above. 


Discovery 
me (i) when the 


For purposes of simplifyi velopment, assu 
purp nplifying the develop , |, he either knows 


subject is at a particular choice point on а particular tria. 


264 PSYCHOMETRIKA 


or does not know the correct response; (11) if the former, he gives the correct 
response at once; (111) if the latter, he tries responses at random until he hits 
on the correct one. It is well known, empirically, that assumption (ii) is not 
literally correct, since subjects ordinarily try alternatives in a patterned way. 
However, as long as the pattern is (for the average over all subjects) un- 
correlated with the location of the correct response, results will be unchanged, 
and there may be no need for a more accurate but more complicated substitute 
for (iči). 

Under the above assumptions, the expected number of errors E, on a 
given trial will be proportional to the product of the average number of errors 
per random search, $(n — 1), and the number of unlearned responses at the 
beginning of the trial, U,_, : 

(2) Е, = àn-—0D0U.;. 

Drawing again upon the empirical data that describe the fixation process 
in serial learning, some of the regularities in these data allow the derivation 
from (2) of an equation for E, the total number of errors to criterion. In 
particular, it is assumed that the Kjersted-Robinson law (cf. [5], p. 619) 
applies to the fixation process in maze learning. This law asserts that the 
percentage of responses learned through the ith trial, (Т, — U,)/L, is a function 
only of t/T, say: 

(3) (L — U,)/L = g(t/T). 


By the definition of Æ as the sum of the Z, , and by using (3) to eliminate 
U,., from the right side of (2), 


T 


W s= E= Lie DL — gü/m)] = 1 — DL ΣΠ — gyn. 


1 


But, by a well-known theorem on homogeneous functions, 
T 1 
(5) { g(t/T) dt = т] JIA) d = KT, where K is a constant. 
0 


Since the integral of (5) is an approximation to the last term of the sum on 
the right of (4), then (4) may be rewritten approximately: 
(6) Е = 1(1 — KR — YLT. 
Finally, substituting (1) or (1’) in (6): 
(7) Е = a(n — 1)Lf(L) (а = a constant), or 
(7) E = а'(п — WL’, respectively. 


From (6) it is apparent that the number of errors to criterion will vary 
proportionately with the number of alternatives at each choice point, more 


HERBERT A. SIMON 265 


precisely, with (n — 1), and with the product of the length of the maze by 
the number of trials. If the number of trials is, by (1^), proportional to the 
length, the number of errors, by (7^), will be proportional to the square of 
the length. All the quantities appearing in (1^, (6), and (7’) are observables, 
and hence the equations can be fitted to data on learning performance in 
mazes. In fitting (1^) and (7^, one degree of freedom corresponding to the 
constant of proportionality, b or a, is lost. 

From the data of Robinson and Darrow reported by Hovland ([8], P- 
619, Table 1), one can make a numerical estimate of (1 — K) in (6), obtaining 
the value .41. Since this constant has only an empirical basis, it cannot be 
expected to be exact; in what follows the approximate value 4 will be used. 
If (1 — K) is estimated at .4, no degrees of freedom are lost in fitting (6). 


The Experimental Data 


The two principal bodies of evidence for testing the model are: @ а 
series of experiments by Brogden and his associates [1, 2, 8] on human learning 
in verbal mazes of various lengths and numbers of branch points; (11) а 
series of experiments with animals by Scott and Henninger [7] using two- 
alternative mazes of various lengths. 


Number of trials to criterion 


All these data support equation (1’). Brogden and Schmidt [1, 2] obtain 
an average value for b of .75 for 16-unit mazes, and .83 for 24-unit mazes. 
Thompson [8] obtains an average b of .75 for 12-unit mazes. Since the fixation 
tasks in all three sets of experiments were of about the same difficulty, and 
since the subjects were drawn from the same population (volunteers from the 
introductory psychology course at the University of Wisconsin), the relative 
constancy of the values of b cannot be regarded as a mere artifact. For this 
reason it is justified to pool the data from all three sets of experiments, using 
a single averaged value of b. (On the other hand, Scott and Henninger [7] 
їп experiments using а variety of maze designs and animal subjects obtained 
values for b ranging from .7 to 2.) Finally, it is noted that "Thompson [8] 
compared the correction and non-correction methods and found no significant 
difference between them in the relation of length of maze to trials to criterion. 

That the number of trials is independent of number of alternatives ot 
each choice point, as postulated in (1), is shown by the data of Brogden E 
Schmidt [1, 2] and of Thompson [8]. In all three sets of experiments, Ы е 
average number of trials to criterion was not significantly г elated to num ad 
of alternatives, except that the average number of trials was usually hi 
nificantly low for mazes having only two alternatives at each choice ied 
Hence, the assumption of independence holds strictly only for n greater 2 
2. Thompson reports data for a 12-alternative maze with n ranging from 


—— — 7 


PSYCHOMETRIKA 


266 


826 
БАЕ 


0°ЁТ 


LT 


€T 


'(б$әтєщ eu 
jo suagua[ IOUS Атәлтї®етәї әчї JO asnedaq) $* = 9 UTA *(g) uotyenbs шоду poure4307 


" 'Хтәлт1оәйсәл ‘ә2еш jo səd} moj eua 
101 5105 oua шолу рәзђештаѕә 'τ pue “+ ‘g'T fZ = q UTA S(T) uorawnbe шолу peure300x 


2° Ot gz Ες (σοι S*t 5ο O°St 

L 09 5.05 o's 691 + τὶς ο 

огт о'от 0'9 ο.οτ ος 
(£T 1'6 G'E WOT L'S 

στ οτ 9 οτ ς στ οτ 9 or а 9 S ἡ 
su4guoTl 9291 suaguoT әт $чз9цәт οσο] sujgue'] ozen 
AI III II 1 
sədKr ez 


( [2] iog9uruuog pus 33005 mors 8480) 


uj2ueT ZurKxeA JO Sezuw([ Jo 59441, mol 107 
ποταθητα9 O} SAOMTY pue uOTXe4T42 OF STEFAL Tenqoy pue pojoTpeJdd 


T TIV 


Z evwurse 3 (P 


Tenzoe т (5 


жә4ешт%5ә 1 (q 


Tenjzoes L (5 


s pue J JO SənTSA 


a  s— — —I'iClIYYTÍ[Y (l C —uu swak GG aS αυ‏ ت س 


LA 


HERBERT A. SIMON 267 


to 6; Brogden and Schmidt report data for a 16-alternative maze with n 
ranging from 2 to 8, and for a 94-alternative maze with n ranging from 2 
to 12. 

That the number of trials varies proportionately with maze length is 
shown by the near-equality in values of b obtained with mazes of similar 
design of lengths 12, 16, and 24. Table 1 provides additional evidence in the 
form of a comparison of the actual with estimated values of T for mazes of 


various lengths in four of the designs studied by Scott and Henninger [7]. 


Number of errors to criterion 


Having tested that part of the theory which relates to number of trials, 
the data on errors is now to be considered. Brogden has made available to 
the writer the data on numbers of errors and trials to criterion for the 210 
individual subjects in the experiments reported in [1] and [2]. Using the 
estimated value of .4 for (1 — К), the experimentally determined values 
for n and L, and the observed values of T, call them T; , Ë is estimated for 
each subject by (6). Let E; be the observed values of E, and E* be the values 
estimated from (6). It has been pointed out above that no degrees of freedom 
are lost in this process, since all the parameters in (6) are estimated in- 
dependently of the observed E; . 

The mean E; for all 210 subjects is 376.7, and the standard deviation is 
347.8. Designating the error of estimate, d; = Ё; — E* , the arithmetic 
mean error (the mean of d;) is only —12.48. This implies that the least 
squares estimate of (1 — K) is about .39, instead of the .4 estimated from 
the serial learning data; for the mean of the δι, 376.7, is about 39/40 of the 
mean of the #* , 389.2. 

More remarkable, the estimates from (6) account for 93.7 per cent of 
the variance in the Е; . The variance of the 2; is 5,934 X 10°; the variance 
of the d; is 335 X 10". The latter is only 6.3 per cent of the former. The 
coefficient, of variation, the ratio of the standard deviation of the d; to the 
mean of the E; , is .23. 

The entire theory can be subjected to a severe test by estimating E 
in the experiments of Brogden and Schmidt and of Thompson from (7), 
employing a single average value of a’ obtained from the whole set of experi- 
ments. Estimating b — .75 from the experimental data, a' = 3(.4) (15) = 19. 
Substituting in (7^) the values 12, 16, and 24 for L gives the corresponding 
equations for E: E(12) — 21.6 (n — 1); E(16) — 384 (n — 1); E(24) = 
86.6 (n — 1). The least squares regressions reported by Thompson [8] and by 
Brogden and Schmidt [1, 2] are: E(12) = 31.1 (= 1 24.7; E(16) = 
34.1 (n — 1) + 2.3; E(24) = 93.4 (n — 1) + 98.0. Considering that the 
only degree of freedom lost was that used to estimate b, these equations must 
be regarded as a remarkably close fit. Moreover, if the regression line i 
constrained to pass through the origin, as required by the theory, the actua 


268 PSYCHOMETRIKA 


data would give E(12) = 23 (n — 1); E(16) = 37 (n — 1); and E(24) = 96 
(п — 1). This provides, on the whole, an even closer fit. 

The test of (7^ may also be made with the data for 210 individual 
subjects that were used to test (6). Let Е; be the estimate of E; from (7^), 
taking b = .8. Then the mean E' = 350.1, which is about 5 per cent below 
the observed mean. (Stated otherwise, a least squares estimate of b would 
give a value of about .84.) The variance of the (E; — Εν) is 1,039 X 10°, 
or 19.5 per cent of the variance of the E; . Hence, the estimate from (η 
accounts for just over 80 per cent of the variance, as compared with the 
nearly 95 per cent accounted for by estimating the E; from (6). The additional 
source of estimation error lies, of course, in the deviations of the actual 
values of the Т; from the values, Т; , estimated by (13. 

In their first experiment, Brogden and Schmidt ([1], p. 239) adduce one 
piece of direct evidence for (2), which is required for the derivation of (0). 
Since U, , the number of unlearned responses at the beginning of the first 
trial, is equal to L, from (2) 


(8) Е, = à(n — 1)U, = à(n — DL. 
Brogden and Schmidt find that the data fit (8) within sampling error. 


REFERENCES 


[1] Brogden, W. J. and Schmidt, R. E. The effect of number of choices per unit of a verbal 
maze on learning and serial position errors. J. ezp. Psychol., 1954, 47, 235-240. 

[2] Brogden, W. J. and Schmidt, R. E. Acquisition of a 24-unit verbal maze as a function 
of number of alternative choices per unit. J. exp. Psychol., 1954, 48, 335-338. 

[3] Hovland, C. I. Human learning and retention. In 8. 8. Stevens (Ed.), Handbook of 
experimental psychology. New York: Wiley, 1951. Pp. 613-689. 

[4] Melton, A. W. Learning. In W. S. Monroe (Ed.), Encyclopedia of educational researeh. 
(Rev. Ed.) New York: Maemillan, 1950. Pp. 668-690. 

[5] Miller, G. A. The magical number seven, plus or minus two. Psychol. Rev., 1956, 63, 
81-97. 

[6] Miller, G. A. Human memory and the storage of information. IRE Trans. on Infor- 
mation Theory, 1956, IT-2, 129-137. 

[7] Scott, T. C., and Henninger, L. C. The relation between length and difficulty in motor 
learning; a comparison with verbal learning. J. exp. Psychol., 1933, 16, 657-678. 

[8] Thompson, R. F. A comparison of correction and non-correction procedures on the 
acquisition of a 12-unit verbal maze. (Unpublished report). 


Manuscript received 9/27/56 
Revised manuscript received 3/4/57 


PSYCHOMETRIKA—VOL. 22, No. 3 
SEPTEMBER, 1957 


A MEASURE OF COHERENCE FOR HUMAN 
INFORMATION FILTERS* 


ВїснАвр H. WILCOX 


NAVAL RESEARCH LABORATORY 


When an information processing system is faced with an excess of input 
information the task of selecting the items which are to get immediate proc- 
essing is frequently assigned to a human being. A quantitative measure of 
the extent to which a man avoids random activity during such ee] 
operations is derived in terms of two parameters (normalize overload an 
correct proportion of selections) which are determined from experimentally 
available quantities. This coherence measure may be used for studies of 
random behavior, comparison of rules for selecting items, and perhaps pre- 
diction of human performance at filtering tasks. 


In this modern age a situation frequently occurs in which an information 


processing system—machine and/or human—is presented with information 
at a rate greater than it can handle. More calls may be placed at a telephone 
switchboard than there are outgoing lines; the rate of plane arrivals at an 
airfield may be greater than the tower operator can handle; a radar may 
display more returns than the tracker can track. In such situations, some 
items are selected for immediate processing while others are filtered out, 
being either rejected or stored for later processing. Р 
Such filtering is often performed by а human being. If ће acts according 
to a single complete set of unambiguous rules, he can be said to be operating 
coherently; but if he makes some selections by chance (casual selection) or 
alternates between several sets of rules by chance (desulatory selection), he 
can be said to be operating at least in part randomly. À thorough investigation 
of such filtering operations should include not only the human filter's capacity 
and accuracy but also the extent to which his activities are random. f 
This paper presents a direct quantitative measure of the extent to which 
a human filier avoids random activity in the selection of informational 
items during periods of overload. In the course of processing information, а 
human may perform various transformations, integrations, and other oper- 
ations, but the only activity considered here is his selection of the genet 
be processed further. The human filter will be represented schematica. Y 
by a box (Fig. 1). Transfer characteristics are defined within the box, bu 


i i i S. Naval Re- 
*The author is particularly indebted to Dr. Harold Glaser, of the U. S. , ae 
search Laboratory, for many helpful suggestions during the development of this mea: 


269 


270 PSYCHOMETRIKA 


FILTER 


r------- 


Lx ай Sa | Tab Toa | 
1 1 
5 
Sb | | 
Т Rp—————Á 
Pei e 8 
Ficure 1 


Schematic Representation of a Human Information Filter 
S refers to inputs or stimuli 
R refers to outputs or responses 
T refers to transfer channels relating 
the inputs to the outputs 


these are solely for the purpose of relating the inputs and outputs; such 
characteristics are not necessarily analogous to the actual mental processes 
involved in selection, nor are they intended to be. 

Several measures of the filter performance may be obtained from the 
inputs, transfer characteristics, and outputs. The simplest would be the 
ratio of correct selections to either total number of selections or total number 
of inputs, but these do not take random activity into account. Quastler and 
Buckley [3] employed similar functions in a measure, based on information 
theory, which provides the information one receives by being informed of the 
filter's specific selections. Errors are treated as equivocation, so that when 
the filter is operating in a completely random manner no information is 
obtained. It was this work which first interested the author in the human 
filtering problem. However, a direct measure of the extent of randomness, 
or lack of it, would be more convenient in selecting personnel for filtering 
activity and establishing proper priority rules for them to use. 


Derivation of the M. easure 


In this derivation capital letters are used as |. 
channels, and responses (Fig. 1); corres 
the number of items in these groups or passing through these channels. The 
human filter is confronted with a total of s informational items, or stimuli, 
but when s exceeds the number of items he can handle in the time available 
he selects only r, of them. His task is to divide the total set (S), containing 
s items, into two sub-sets according to a complete program of unambiguous 
rules so that the sub-set he selects (A) contains items each having higher 


abels for stimuli, transfer 
ponding lower case letters refer to 


Ne 


RICHARD H. WILCOX 271 


priority than any in the sub-set which he rejects (R+). However, the items 
comprising his selections may or may not be those which should have been 
selected, according to the rules. S, will be defined to be the theoretically 
correct sub-set of higher priority stimuli which he should select, so that 
Sa = r, in number, and the remaining stimuli will be grouped into S, . Since 
the filter may make any number of selections between zero and s, the rules 
of selection must be such that all items in S can be ranked according to 
priority so that exactly s, items can always be found such that each of them 
is of higher priority than any of the remaining items. However, once the 
magnitude of s, is determined from r, and the specific constituents of S, 
have thereby been established, their relative priority is of no further concern. 

If the human filter is operating completely coherently, ie., exactly 
according to the rules, his responses R, are made only from the sub-set of 
correct stimuli S,—he is selecting only through the transfer channel Ta, . In 
this case £, = r, . His rejections R, comprise the other sub-set S, , so that 
he is rejecting only through the transfer channel 7^, . 

However, if he makes errors in his filtering activities, some of his selec- 
lions must be made from the lower sub-set of stimuli S, . Thus, in addition 
to his correct selections through transfer channel T,, , he is making some 
erroneous selections through channel T',, ; the items of the sub-set S, which 
he does not include in R, because of incorrect selections must be rejected 
through transfer channel Ta, . 

Suppose the human filter makes some of the selections according to the 
rules and then eitber guesses at the rest or reverts at random to an alternative 
Set of rules. The desired measure should indicate what proportion of the 
responses were made coherently. This is similar to the problem faced in 
Compensating for guessing in multiple-choice tests, and the same method 
of solution ean be adapted to develop a proper measure here. In general, the 
procedure is to (7) count the number of erroneous selections, (7?) compute 
the number of guesses from which that number of errors was most likely to 
result, (227) subtract the number wrong from the probable number of guesses 
to obtain the probable number of right guesses, and (iv) subtract the probable 
number of right guesses from the total number right to obtain the score. In 
multiple-choice testing this results in the formula 


(1) 8 = R — [W/(n — 1)], 


in which S is the corrected score, R is the number right, W is the number 
Wrong, and n is the number of choices available for each answer ([1], p. 518). 
However, in multiple-choice tests the selection of an answer to a bap o 
is essentially independent of the preceding selections, whereas each ш 
the human filter seleets an item the relative number of available Lr E 
right and wrong guesses changes and therefore the probability Gel prs A 
correctly guessing the next selection changes. In such a situation (s 


272 PSYCHOMETRIKA 


hout replacement) the probability that out of n guesses exactly w will be 
wrong is given by a hypergeometric distribution ([2], pp. 182-183) from 
which the expected number wrong E(w) is found to be 


(2) l Ew) = ns/(n + s). 


(The assumption is made here that no wrong selections are made knowingly, 
so that when the human filter starts guessing all the s, lower priority items 
are available, originally, for wrong guesses. Now the difference between 


the total number of guesses and the expected number of wrong guesses must 
be the expected number of right guesses E(a); that is, 


(3) Ela) = n — E(w) = E*(u)/[s, — E(v)]. 
It remains only to subtract the expected number of right guesses E(a) from 
the total number of correct selections faa to obtain the number of correct 
selections which were made coherently, i.e., by applying the rules and not 
guessing. For convenience in comparin; 


g different experiments the result 
should be normalized to the number of responses. Thus the resultant measure 
is 


(4) € = [ta — E(@)]/r, . 
Just as with multiple-choice test scoring, the best likelihood estimate of the 
theoretically expected number wrong E(w) 


1 is the actual number wrong fa . 
Making this change in (3) and substituting the result in (4) yields 


(5) 6 = е اه‎ ЩЩ 
ae Ta(Ss — ἐν) 

Let L = s/r, and A = 1.2/7 . By al; 

eters (Fig. 1) it can be shown that 


а БАЕ] 
(6) Ç EAT 
The coherence measure C is defined to be the normalized difference 
-between the total number of correct selections and that portion of the correct 
selections which was obtained by guessing; thus Ç as given in (6) cannot be 
negative. Setting (6) equal to zero and solving for A shows that 4 — 1/L for 


gebraie manipulation of the filter param- 


er of selections made. 
quals the unit interval; 
b-interval 1/L Now for 


Then for completely coherent activity, A equals one e 
for completely random activity, A is equal to Some su 


RICHARD H. WILCOX : 273 


A < 1/L a corresponding A’ is found such that the variation of 4 below 

1/1, is the same proportion of the interval (0, 1/L) that the variation of A’ 

above 1/1, is of the remaining interval (1/L, 1). This image value is 

(7) A’=14A4-LA. 

Substituting the expression for A’ from (7) in place of A in (6) and changing 

the sign to indicate that the filter response is poorer than one would expect 

by chance yields 

(8) C LA -- 1 + L(1 — LA) Р 
L+ A -—2+0( — LA) 

Here C varies between 0 and —1, and C as given in (6), which is used for 

A > 1/L, varies between 0 and +1. 


A < 1/L. 


Discussion of the Measure and its Use 


As derived in the preceding section, the coherence measure is given by 


(6) @= pat A>1/L, ond 
(8) e LA — 1 + L(1- LA) AL 


“L+A—2+(1—LA)’ 
where Т, is the overload into which the filter is working and is equal to s/r, , 
and A is the proportion of selections which are correct, or La fra . Thus the 
Parameters L and A are determined from three experimentally available 
quantities; s is the total number of items available for filtering, Το is the 
total number of selections, and taa is the number of correct selections. 
. . Either the actual load s or the normalized overload L is usually the 
Independent or control variable; either the raw or the normalized number of 
Correct selections (fa or A) is the dependent variable of interest. The со- 
herence measure is simply a corrected value of A which takes random guessing 
Into account. Since the probability of a wrong guess is a mathematical 
function of the overload, L appears in the measure. However, C is not a 
measure of the relationship between L and A. C varies between plus and 
minus one only because it is normalized to allow comparisons among several 
experiments and not because it is a coefficient of correlation or κα 
Coherence may be determined from experimental measurements, τ 
thus the statistical behavior of C is of interest. Since it 15 ἃ corrected va, ш 
of A, the probabilities of individual values of C = C, are equal NC ч 
abilities of corresponding experimental values of A = A, from whic y 
Were computed according to (6) or (8). That is, 
(9) Pr [C = Οἱ) = Pr {A = Αν]. EC 
Thus it follows directly ([2], p. 172) that the mean κο Mere ue 
€àn be computed in the usual way, even though the pro 


274 PSYCHOMETRIKA 


values of C are not measured directly. The mean and the variation exist, for 
all L > 1, and when L = 1 there is no overload and thus no filtering. 

A negative value of coherence would appear to be logically inconsistent. 
However, it can occur for two reasons. The number of right guesses is de- 
termined from a computation in which the actual number of errors is taken 
as the best likelihood estimate of the theoretically expected number of 
errors. This is certainly the best estimate that can be made, but fluctuations 
about this value must be expected. "Thus, if the human filter makes a dis- 
proportionately large number of bad Euesses, the measure can actually go 
negative. Such fluctuations should be cancelled in the long run by the high 


values of C resulting when the filter makes a disproportionately large number 
of good guesses. 


The measure of coher 
random behavior as a fi 
of other variables in 


load. Also, it is quite possible that some indivi 
generally than others; if this is true asi 
coherence level might permit i 
filtering tasks. 

In fact, anytime one is concerned with ad 
random activity in overloaded inform: 
measure should be a useful quantitat; 


herence to rules or with partially 
ation handling systems, th 


e coherence 
ive tool. 


REFERENCES 


[1] Conrad, H. 8. Investigating and appraising intelligence and other aptitudes, In T. G. 
Andrews (Ed.), Methods of Psychology. New York: Wiley, 1948. Ch. 17. 

[2] Feller, W. An introduction to probability theory and its applications, Vol. I. New York: 
Wiley, 1950. 

[3] Quastler, H. and Buckley, 
aration for publication. 


Manuscript received 8/23/56 
Revised manuscript received 2/25/57 


E. P. Private communications concerning material in prep- 


PSYCHOMETRIKA—VOL. 22, No. 3 
SEPTEMBER, 1957 


A CORRELATIONAL ANALYSIS OF TRACKING BEHAVIOR 


W. B. KNowLEs*, 
J. G. HOLLAND, 
AND 
E. P. NEWLINT 


U. 8. NAVAL RESEARCH LABORATORY 


This study reveals the usefulness of multiple correlation techniques in 
estimating the relative importance of different aspects of a tracking task in 
the operator’s tracking behavior. The technique is applied to a compensatory 
tracking task with a position control. 


This study is designed to ascertain the adequacy of multiple correlation 
techniques in estimating the relative importance of different aspects of a 
tracking situation to the operator’s tracking behavior. The tracking task 
chosen for analysis is a compensatory task in which the operator is required 
to manipulate a joy stick so as to keep an externally driven target dot centered 
on a hairline reference located on the face of a cathode ray tube. A direct 
Position control is used so that the tracking error (i.e., the displacement of 
the dot from the hairline) is equal to the difference between the course dis- 
Placement and the control displacement. 

In this task the operator might be capable of extracting usable infor- 
Mation from the visual display in terms of the magnitude and direction of 
the displacement of the dot, the speed and direction in which the dot 18 
moving, and possibly even ehanges in the speed. That is to say, the visual 
Stimuli to be evaluated are the instantaneous values of the error е and its 
first and second derivatives é and ё. In addition, the operator may base his 
responses partly on information contained in the several components of the 
Stick motion, i.e., instantaneous stick displacement №, and its first and 
Second derivatives № and Ë. It is further assumed that each of these six 
variables would have its influence on performance after some delay, roughly 
analogous to a “reaction time” for each variable in question. These delay 
times, which must be revealed by the analysis, are left free to assume different 
values for each of the assumed variables. 


*Now with General Electric Advanced Electronics Center. 


i ; per: 7 e so 

{The authors wish to express their appreciation of the many телев. дорие 
Eenerously offered encouragement and advice. Particularly helpfu ча 8. eeu 
- Glaser of the U. S. Naval Research Laboratory and W. J. MeGi ee καν 
Institute of "Technology. The authors are also deeply indebted to. e ο dichas: ië: 
Who performed the many long and laborious statistical computations, 


Serves more than this footnote. 


275 


976 PSYCHOMETRIKA 


The specific measure taken as the performance criterion, or dependent 
variable, is the acceleration pattern of the Stick movements. Intuitively, it 
seems that a response should be a change from one state to another. Use of 
the acceleration measure assumes that the operator maintains a rate of move- 
ment until an appreciation of the stimulus situation leads him to generate a 
different rate or direction of movement. Birmingham and Taylor [1], dis- 


independent stimulus variables are necessarily 
tal isolation is not possible. Multiple correlation 
vent this difficulty, permitting statistical isolation 
even when experimental isolation is impossible (cf. [3] and [7]). This is true 
ficient measures the regression of an independent 
variable with the influence of other independent 


The use of this index can best be illustrated by returning to the original 
Problem posed for th: 


is study: to relate the performance measure №, to each 


Siok saan : variables, and the second using both display and 
d toa variables. The specific statements of these relationships take 


а) R, = Wr, + δέ, κ, F сё, χι, 


ντα Чы 4 Pis + ORs, + fi, + oft, 4. . 
Equations (1) and (2) are recognizabl, 


analysis is concerned, the fact that these are di 
Generalized, they are equivalent in form to 


(3) Ху = B.s.3.... X, xim DBia-2-- X3 Tee T Bin-23.. (1X, " 


where X, is the dependent or criterion variable, and A EAD чк Х, are 


W. B. KNOWLES, J. G. HOLLAND, AND E. P. NEWLIN 277 


the independent or predietor variables. The beta coefficients are the desired 
partial regression coefficients; they represent the net weight to be assigned to 
each predictor when the simultaneous effects of all the other predictors are 
held constant. If (1) and (2) are interpreted as multiple regression equations, 
the coefficients a, b, c, etc., regarded as partial regression coefficients, will 
represent the degree to which Ë is dependent upon each variable in question 
with all the other variables in that equation held constant. That is, these 
coefficients will indicate the relative weight of each variable in determining 
the response. 

A check on the adequacy of any particular solution is afforded by the 
multiple correlation coefficient, which can be caleulated for each solution. 
This index indicates the correlation between the observed values of the 
dependent variable and the predicted values obtained from the multiple 
regression equation, thereby providing a measure of the precision of the 
Predictions, For example, if the multiple correlation for (2) should not be 
Significantly higher than the multiple correlation for (1), this would indicate 
that only the display variables were active in determining the responses. 
But if the multiple correlation for (2) should be higher than that for (1) 
this would indicate that both display and movement variables are of im- 
Dortance in determining the responses. And if, in either case, the multiple 
Correlation coefficient is small this would indicate that neither formulation 
18 adequate and that truly critical variables probably had not been included. 

There remains the matter of determining the appropriate lag associated 
With each independent variable—this presents a major problem in the 
evaluation of the above equations. These lags are found by first calculating 
the correlation functions between № and each independent variable % ê, 
ete., lagged over the period in which significant relationships are at all likely 
to be found. The spuriously high correlations at zero lag are ignored. The 
interval at which the maximum correlation between the dependent variable 
and the given independent variable occurs is selected as the appropriate lag 
for that predictor. An alternative procedure, which would avoid this some- 
what arbitrary selection of lags, involves trying all combinations of lags and 
Accepting that combination resulting in the largest multiple . correlation 
Coefficient. Calculation of such solutions would have been a prodigious task 
Quite beyond the practical limits of the experiment. The few additional 
likely combinations which wére tried yielded lower multiple νι 
SO the procedure of using lags based on the highest correlations was adopted. 


Method 
Apparatus 
A one-dimensional tracking system with auxiliary multi-channel ἘΣ 
ing equipment was set up on electronic analog computing equipmen 
Outlined in the block diagram in Fig. 1. 


278 PSYCHOMETRIKA 
MCI SUA F E 
| RECORDER | 
I 2 R | 
l 
I 
| 


COURSE 
INPUT 


FIGURE 1 
Simplified Block Diagram of Apparatus 


The course in 
periods of 5.12 sec. and 


which held the equipment. A 1° displa 
placement of the error dot subtending a visual angle of 0595, Stick displace- 
ment was transduced into voltage output by use of a vacuum tube strain 
gauge, RCA 5734. Simultaneous recordings of е, ὁ, ë, R, R, and Ë were taken 
with a six-channel Brush polygraph run at a paper speed of 1 ет, /.94 sec. 


Procedure 


Two naval enlisted men served as subjects and Were given 22 sessions 
of five 1-min. trials over the course of seven days, An integrated error score 
was obtained for each trial as a gross measure of performance. Polygraph 
records were taken only during the last 30 sec, of the trials composing sessions 
21 and 22. By this time the learning curves appeared to have reached an 
asymptote. 

A representative record, free from artifacts, w. 


as selected for each subject 
and submitted to analysis. 


Analysis of Records 


Corresponding sections of the two rec 


ords were read at intervals equiv- 
alent to 0.06 sec. Two independent readir 


188 were made and collated into a 


UU 


W. B. KNOWLES, J. G. HOLLAND, AND E. P. NEWLIN 279 


reasonably accurate final tabulation of the six variables at each of 310 points 
covering 18.9 sec. of tracking. A separate analysis was made for each subject. 

Correlation functions (Pearson product moment correlations) were then 
computed between each variable and every other variable over 25 lags of 
0.06 sec. or a total of 1.5 sec. The computations were programed on the 
NAREC digital computer. 


Results 


A sample tracking record of the six variables for S, is presented in Fig. 
2. The record for the other S had much the same appearance and does not 
reveal any obvious differences between the two Ss. However, the integrated 
error scores showed that S, was consistently better than S, . The correlation 
functions are also somewhat different for the two Ss as ean be seen in Fig. 3, 
where the correlations between / and each predietor are presented.* The 
primary difference seems to be in the nature of the dominant periodicities 
of the correlograms. Otherwise the corresponding functions seem remarkably 
similar, at least up to lags of about 0.75 sec., the interval of maximum interest. 
In both cases the highest correlations are observed between №, and ὁ at only 
slightly different lags. Any given point on these curves must exceed an 
absolute value of .154 to reach the 1-per cent level of significance; due to the 
large number of correlations involved, a more stringent standard should be 
Set. 

From these functions, particular raw correlation values and their lags 
Were selected for use in evaluating equations (1) and (2). Since the tracker 
Must generate responses opposite in direction to the displayed error in order 
to keep the dot on the hairline, the coefficients for e, ê, and ê in these multiple 
regression equations must be negative. There were no known a priori re- 
Strictions on the signs of movement variables, but, as it turned out, the 
highest correlations for these predictors were also negative. This being so, 
the highest negative correlations between Ë, and each predictor, together 
With the lags at which these correlations occurred, were selected for use in 
the analyses. Inspection of the scatter plots for these raw correlations sug- 
Bested that the assumption of linear regression was met. These raw correlations 
Were then processed by the Doolittle method [cf. 7] to solve the constants for 
(1) and (2). 

Entering the constants in (1) for S, yielded 


(4) Ë, = —.095e — .621é — .0176, 


А g 2 TT f 
"The NAREC digital computer provided intercorrelations for all combinations οἱ 
the six variables cach at 26 lags taneli g from 0.00 to 1.50 sec. A complete аваар с 
ese intereorrelations has been deposited with the American Togumentatior Hoh 
rder Document No. 5206 from the ADI Auxiliary Publications Project, $1.75 for 35 Tam: 
ervice, Library of Congress, Washington 25, D. C., remitting in ааш P hotodupli- 
Microfilm or $2.50 for 6 X 8 in. photocopies. Make checks payable to › 
cation Service, Library of Congress. 


280 PSYCHOMETRIKA 


60 sec. INTERVALS ——e 


ERROR, 2, 


VELOCITY OF ERROR, ὁ, 
ACCELERATION OF ERROR, Z 


STICK DISPLACEMENT, R 


VELOCITY OF STICK DISPLACEMENT, 


ACCELERATION OF STICK DISPLACEMENT, R 


FIGURE 2 
A Typical Portion of the Tracking Record for 8, 


W. B. KNOWLES, J. G. HOLLAND, AND E. P. NEWLIN 281 


5-- Γῇ ἐλ 


= 


е 


CORRELATION (r) 


Д 
EJ 


-B 


84 108 132 


Je 36 60 84 108 132 12 36 60 
LAG (SECONDS) 


FIGURE 3 
Plots of raw correlations between each independent variable and thi 


(Ê2 as a function of lag time. The broken line represents Si's corre: 
line represents S:'s correlations. 


e dependent variable 
lations and the solid 


282 PSYCHOMETRIKA 


where the lags for the respective e’s are —.06, —.24, and —.36 sec. The 
corresponding result for S, is 


(5) №, = — .192e — .4556 — «0145, 


where the lags for the respective e's are —.06, —.18, and —.24 sec. The 
multiple correlation coefficients are .675 for S, and .566 for S. . The amount 
of variance in Ë, accounted for in these multiple correlations is about 46 
per cent and 32 per cent, respectively. The predictions are considerably 
better than chance but still represent rather crude approximations. 

The data presented in Tables 1 and 2 show that in the equation for 
S, only the partial regression coefficient for ὁ is significantly greater than 

TABLE 1 
Results of the Partial Correlation Analysis for S, 

Using Only the Display Variables 


Regression Standard 
Correlation | Coefficient (8) | Error of B 


-.095 . 050 1.919 = 
σε 621 . 054 11.500 .01 
-.017 . 049 0. 349 - 


TABLE 2 


Results of the Partial Correlation Analysis for Sz 
Using Only the Display Variables 


Raw Regression | Standard | , Significance Partial 
Correlation | Coefficient (4) | Error of β Level of g |Correlation* 

~. 384 -. 192 . 054 3.529 

-.543 -. 455 060 7.533 

-.234 -.014 055 0. 253 


offer no problem of interpretation. 


zero. In the formulation for S, the coefficients for et 


and é are both significantly 
greater than zero. Furthermore, of the total amount of variance accounted 


for by these equations, in the ease of δι about 85 per cent is attributable to 
variations in ¢, and in the ease of S, , 69 per cent is attributable to é and 
only 29 per cent to е. 

Solutions for (2), where both display and 


movement variables are 
included, yielded for δι 


(0) Ë, = —.579e — 1166 — .054 — .335R — .450R — 046 


LS 
p——— a 


z. 


W. B. KNOWLES, J. G. HOLLAND, AND E. P. NEWLIN 283 


where the lags for the respective e’s are —.06, —.24, and —.36 sec., and the 
lags for the respective R's are —.06, —.24, and —.36 sec. For S, 


(7) Ë, = —.498e — 1826 — .014ë — .352R — 8550 — .026Ё 


where the lags for the respective e’s are —.06, —.18, and —.24 sec., and the 
lags for the respective [05 are —.06, —.24, and —.24 sec. The multiple 
correlations when all predictors are used are .757 for S, and .722 for S, . The 
amount of variance accounted for is 57 per cent and 52 per cent, respectively. 
An F test [cf. 6] reveals that these multiple correlations are significantly 
larger (p < .001) than those obtained by using only the display variables 
and indieate that the six variable equations should be regarded as closer 
approximations to an adequate description of the factors influencing the 
operator's tracking performance. 

The most striking feature of these equations is the large shift in the 
regression coefficients for e and é resulting from the inclusion of the movement 
variables, e is the single most important determiner of Ë, , accounting for 
37 per cent of the predictable Ë, variation in the case of S, and 35 per cent 
in the case of S, . On the other hand, ὁ is a negligible factor in the equation 
for S, , and in the equation for S, it accounts for only 13 per cent of the total 
Predictable Ë, variance, even though the regression coefficient is reliably 
greater than zero (see Tables 3 and 4). In both cases, R and Æ now assume 
considerable importance. The amount of predictable variation attributable 
to the R factor is 21 per cent for S; and 25 per cent for S, . Comparable figures 
for È are 28 per cent for S, and 25 per cent for S; . Thus, in this tracking task 
at least, the acceleration pattern of the control movements seems to be 
determined largely by e, R, R, and possibly to a lesser extent by 6. The 
adequacy of this formulation is shown graphically in Fig. 4, where the pre- 
dicted points are plotted against two samples of actual tracking record for 
Sı . The greatest error appears to be in a failure to predict a high frequency 
Component of the response, although the fit to a smoothed record would 
Probably be quite good. 


Discussion 


The results indicate the degree to which the acceleration pattern of the 
Control movements can be predicted assuming known values of the other 
Variables at some time previous, when these variables are taken singly or 
in combination according to some assumed first-order linear scheme. 

The individual correlations show that if a prediction is to be made from 
Only a single variable, the best results can be obtained from values of error 
Velocity ὁ taken at a lag of about .20 sec, The solutions of (1) indicate ni 
only slightly better predictions can be made using all three error 20 res, 
but that ὁ still carries the most weight. The regression equations 2 = 
both e and R measures give considerably better predictions and indicate 


284: 25 PSYCHOMETRIKA 


TABLE 3 


Results of the Partial Correlation Analysis for S, Using Both 
the Display and the Control Movement Variables 


š Raw Regression Standard 
Variable| Lag Correlation | Coefficient (8) | Error of g 


-. 374 -.579 


Significance Partial 
Level of 8 |Correlation 


-.406 


-. 675 


-.116 -. 089 


-. 282 -.054 


-.059 


-.361 


-. 335 -.347 


-. 298 


τε 450 -. 352 


-.226 -- 046 


=. 051 


ΤΑΒΙΕ 4 
Results of the Partial Correlation Analysis for S, 


Using Both 
les 


the Display and t 


he Control Movement γατα 


Raw Regression | Standard Significance Partial 
rrelation | Coefficient (jj) Error of β Level of g Correlation 


Shifts in the relative weightings given to 
Ë, is most heavily dependent upo 
simultaneous Position of the sti 
about .24 see. earlier, 

The multiple correlations found in 
high. Even so, a sizable amount of residu: 
either as systematic variability due to vari 


the e vari 


iables. It is now found that 
n the error .06 Sec. earlier, together with the 
ck, also .06 Sec. earlier, and its velocity, 


is expected from a number of sources. So 
the recording equipment, Also, analog computers are subject to drift and are 


particularly noisy when performing double differentiations. Needless to say, 
2 


precautions were taken to keep this source of Variation as small as possible 
Undoubtedly some error was involved in reading the records. This factor 


was checked by re-reading one hundred points on the R and ὁ records for 
S. , calculating reliability coefficients, theoretical 


W. B. KNOWLES, J. G. HOLLAND, AND E. P. NEWLIN 285 


—— OBSERVED 
**: PREDICTED 


60 sec. INTERVALS ——> 


FIGURE 4 
È record with points predicted by the multiple regression 


Two typical samples of Si's 
ovement variables. 


equation using both display and control m 


correlation between Ë, and é (with lag — .06) corrected for attenuation due 


to measurement errors [ef. 7]. The reliability coefficients were .986 for Ë 
and .994 for é, and the raw correlation between R, and ὁ (with lag —.06), 
had it been corrected for attenuation, would have changed from —.543 to 
—.548. Although the correction is low, it nevertheless illustrates that some 
small part of the residual variance may be accounted for by errors in record 
reading. A related non-systematic source of error was introduced by the .06 
sec. grain used in quantizing the continuous records and in lagging the 
&xsporrelations. The point of maximum correlation between any two variables 
is estimated to within .03 sec. This factor may have aff ected slightly not only 
the size of the maximum correlations but also the choice of the lags and the 
values of the intercorrelations. The net effect may have been to lower some- 
what the size of the partial and multiple correlations. Finally, the moment- 
to-moment variation of the subject’s behavior, a characteristic to be expected 
of all human behavior, constituted still another source of random variance. 
These several sources of variance would generally tend to lower the raw, 
partial, and multiple correlations, although they should have little influence 
on the relative importance assigned to each’ variable. It is not possible to 
determine whether or not the above sources are responsible for all of the 
residual variance. In analyses such as these a possibility always remains that 
some additional variable may be active in determining the response. For 


286 PSYCHOMETRIKA 


example, comparison of the predicted and obtained points in Fig. 4 πμ 
that a high frequency component of К, › approximately 10 Gy cles WEE 
is not being predicted. These frequencies are about those of norma т д 
tremor. Effectively, the tremor, if indeed this is the source of these Р 

frequencies, has been treated as a random noise source and no attempt has 
been made to account for it by factors in the multiple regression p a 
At any rate this is a variable which, though possibly a significant source o 


variance, would not be expected to alter the relative influences of the other 
variables. 


of the dot is given a larger weighting. 
redundant with the R variables; 
the R variables for Sı , and for 
Donse variables in the analysis 
It is revealed that the major 
ent, and that this information 


allows this redundancy to be partialled out. 
source of display information is the displacem n 
is of use only in combination with response information from R and R. 
The maximum correlation of R, with e and with R occurs in both cases 
at .06 sec., which is extremely short compared to the typical “choice” reaction 
times [8]. It may be that a continuous tracking task with a fairly simple 
course provides a high degree of “readiness to respond” and thus might well 
involve lower than usual reaction times. 
The utility of this proposed correlational analysis tec 
seen by comparing it with other attempts to discover the 
the operator’s performance [2, 4, 5]. These attem 


hnique is best 
factors determining 
pts derived largely from the 
transfer functions in engineering 
de of each input frequency with 
s harmonies at the output. Fre- 
noise” and represent non-linearities 
е techniques fail to account for. It 
els the present analysis wherein the 
y analysis techniques and the corre- 


splay variables alone do permit a low 
order, approximate prediction of the operator's performance and therefore 


may have some practical utility. However, the results of the present study 
demonstrate that there is considerable danger of obtaining a badly distorted 
view when only the display variables are used. If one is interested in knowing 
the relative importance of different display variables, then omitting the R 
variables leaves the possibility that the relative Weights obtained will be 


W. B. KNOWLES, J. G. HOLLAND, AND E. P. NEWLIN - 287 


confounded by the influence of the R variables. An additional advantage of 
the correlation technique when the R variables are included is that it accounts 
for at least some of the sources of non-linearity. Thus, from a strictly practical 
point of view, the correlation technique gives a closer approximation to the 
operator's behavior and might, therefore, provide a closer prediction of how 
the operator will fit into a given system. 

The multiple correlation techniques should also be useful in investigations 
of changes in performance which might occur when different controls are 
employed. What variables determine the response when an acceleration or 
velocity control is used rather than the position control used here? What 
variables determine the response when an aided or quickened control [ef. 1] 
is used? It may be that performance with a position control and with a 
properly aided control are dependent upon the same variables. The correla- 
tional analyses could be used to test this hypothesis. In addition, the technique 
could be used to describe changes in the dependencies with learning and the 
manner in which these dependencies alter when an operator is transferred 
from one type of a control to another. Furthermore, the technique should be 
useful in evaluating or formulating theories of human tracking behavior. 
These and other uses of the present technique could contribute to a clari- 
fication of the nature of different tracking tasks and provide estimations of 
how the human operator fits into different kinds of tracking systems. 


REFERENCES 


1) Birmingham, H. P. and Taylor, F. V. A design philosophy for man-machine control 
systems, Proc. Inst. rad. Engrs., 1954, 42, 1748-1758. 

2] Elkind, J. I. Characteristics of simple manual control systems. Te 
MIT Lincoln Laboratories, 6 April 1956. : 
3 Guilford, J. P. Psychometric methods. New York: McGraw-Hill, 1936. ч 
4] Krendel, E. S. A preliminary study of the power-spectrum approach to the analysis 
of perceptual-motor performance. AF Tech. Rep. No. 6723, WADC, Oct. 1951. A 
5] Krendel, E. S. The spectral density study of tracking performance. I. The effect ο 
instructions. AF Tech. Rep. No. 52-11, Part 1, WADC, Jan. 1952. 
6] MeNemar, Q. Psychological statistics. New York: Wiley, 1949. ; m 
7] Peters, C. C. and Van Voorhis, W. R. Statistical procedures and their mathematica 
bases. New York: MeGraw-Hill, 1940. 

8] Woodworth, R. S. Experimental psychology. New York: Holt, 1938. 


ch. Rep. No. 111, 


Manuscript received 2/8/57 


PSYCHOMETRIKA—VOL. 22, No. 3 
SEPTEMBER, 1957 


À MEASURE OF THE GAMBLING RESPONSE-SET IN 
OBJECTIVE TESTS 


ROBERT C. ZILLER 
UNIVERSITY OF DELAWARE 


cr formula is developed for mensuring the gambling response-set or 
utility for risk in objective tests in which the testees are apprised of the 
application of a correction for guessing. Some implications of this measure 
for test theory and construction are discussed briefly. 


In 1950 Gulliksen [3] suggested that useful scores might be developed 
from the number of items skipped in an objective examination. Moreover, 
he Suggested that, “Such scores, coupled with careful test directions, may 
Indicate ‘cautiousness’ or some similar personality characteristic of the 
subjects,” Actually three earlier studies on this topic had been reported 
[6, 7, 9] and two studies have been reported recently [5, 8]. This paper sum- 
marizes these studies, develops a formula which provides a measure of the 
Personality charaeteristic indicated, and discusses some implications of this 
Measure for test theory and construction. 

In 1936 Votaw [9] presented data demonstrating a relationship between 
measures of dominance-submission and guessing behavior on a test in which 
the subjects were directed not to guess. In this report the measure of guessing 

ehavior was simply the sum of the unattempted items. 
. ΑΠ improved measure of guessing behavior was reported by Swineford 
1n 1938 [6]. Through special directions on an achievement test, subjects were 
required to indicate their confidence in their response to each item by select- 
Ing the amount of credit desired (2, 3, or 4 units). Double the credit claimed 
Was substracted if the response was incorrect. The "gambling tendency 
Was derived from the following formula: 


(1) G- Errors marked 4 X 100 
~ Total errors + š omissions 


Tn this index it was assumed that all errors were guesses. The reported 
reliability of this measure was .796. TAS 

In 1941, Swineford [7] subjected this index to further analysis an 
Concluded that ability in the field covered by the test is independent of ( 
€ndency to gamble. Moreover, the results suggested that familiarity wit 
ae material has some effect on the tendency to pole ny types of material 
and tests seemed to encourage gambling or guessing behavior. 

More recently [5] an index of maladjustment derived from seleeted 


289 


PSYCHOMETRIKA 
290 


MMPI items was found to correlate significantly and negativ 


ος 5 ç ector 
behavior on an achievement examination in which the use of a correction 
Givin 


ection formula is correlated with personality 
variables derived from a biographical inventory, A “high risker” appears to 


be a self-confident, physically and Socially adequate, competitive, self- 


expressive, secure individual who strongly identifies with the masculine role. 
(The subjects were Air Force aircrew members.) 


In general, it must be concluded that in examinations in wi 
tion formula is imposed With the knowledge of the t 
is introduced in the form ofa personality trait. This trait may be tentatively 
described as "utility for risk." i i 


acceptance for any object: 


formula. Swineford’s approa 'ed direction. However, the 


ramework and 


est. Asa result, the formula is not Sensitive 


to all the possible variances attributable to risk-acceptance, 
ving formula was developed:* 
= — p/GQ- 1)]W 
(2) 57 Wo DF oo 
where R = the index of risk-acceptance, 
^ = the number of alternatives, 


the number of incorrect responses, 
U = the number of items omitted. 


= 
= 
Ш 


upon which the subject guessed to the total 


number of items the Subject did 
not know, but upon which he could have guessed. That is: 


‚ _ _ (True number of questions guessed) 
(3) Е (True number of qu 


estions not known) 
"The author wishes to acknowledge the assistance of Thornton Roby, Tufts Uys 
versity, in the development of this index. 


hich а correc- ' 


_ —ÍWÚLnOT 


ROBERT C. ZILLER 291 


It is assumed that an incorrect response is entirely a chance response, and 
that all the alternatives and items are equally difficult. It should be empha- 
sized, however, that R in (2) is an estimate of R”, and therefore subject to 
chance fluctuations. Thus, when the total number of questions guessed is 
large, R may be expected to be close to R’. However, when the total number 
of questions guessed is small, the numerical difference between R’ and R 
may be considerable; that is, the measure may lack reliability under this 
condition. Furthermore, it should be noted that the general formula can not 
be employed in the case where W = U = 0. 

Thus, the number of guesses may be expressed in terms of the number of 
errors. l'or example, with reference to a true-false examination or test with 
two-alternative items, the number of errors is only one-half of the chance 
responses, assuming that the remaining half of the resporises were correct 
guesses, 


Let G = the number of guesses. 
Then W = 1 G 
4 2 , 
4 G = 21, 
апа R = 2W/(2W + U). 


Risk behavior, as measured by the general formula, is necessarily α 
faetor in any examination involving an announced correction for guessing. 
Moreover, since it has been demonstrated to be a personality correlate, it 
contributes only to the error variance of achievement examination scores. 
This has been recognized for sometime, even though it is not often considered 
when modified response methods for multiple choice items are suggested 
[1, 2]. However, the measure derived through the risk-acceptance formula 
permits more systematie analysis of problems concerning the effects of 
guessing on test scores. 

From the general formula it is apparent that risk-acceptance is a function 
of the number of items omitted U and the number of items marked incorrectly 
W. Necessarily, U and W are functions of the difficulty of the test items. 
Therefore, test score variance attributable to variance in risk-acceptance 
is a direct function of the difficulty of the test items. As the difficulty of an 
examination approaches .00, that is, when the items cannot be answered on 
the basis of knowledge, understanding, ete., risk-acceptance becomes the 
single correlate, of the examination score. er 

Thus, under conditions in which a correction for guessing is imposed, 
the element of risk is introduced and leads to increasing error variance with 
increasing test difficulty. However, research to date indicates that increasing 
test difficulty to a given level under constant conditions of spread of difficulty 
and item correlation will also increase test validity [4]. Yet most of the 
analytical investigators summarized in this report employ mathematical 
models which do not consider the condition of risk described here. Empirical 


292 PSYCHOMETRIKA 


the'foregoing discussion, 
ulty under the invariant 


Aside from the implications for test theory 
provides a unique and useful measure of a pe 


REFERENCES 


[1] Coombs, C. H. On the use of objective examinations. Educ, 28 
308-310. 


[2] Dressel, P. L. and Schmid, J. Some modifications of the multiple-c| 
psychol. Measmt, 1953, 13, 574—594. 
[8] Gulliksen, Н. Theo: 


[4] Loevinger, J. The attenuation pai 
[5] Sherriffs, A. C. and 


chol. Measmt, 1953, 13, 


, 
О is penalized by the penalty for guessing? 
J. educ. Psychol., 1954, 45, 81-90. 
[6] Swineford, Frances. The measurement of a personality trait. J. educ. Psychol., 1938, 29, 
295-300. & 


and Training Research Center, February, 
23, ASTIA, Document No. 07892C.) 
[9] Votaw, D. F. 'The effect of do-not-guess directions on 


the validity, of true-false or 
multiple choice tests, J. educ, Psychol., 1 


936, 27, 698-703, 
Manuscript received 1 129/57 
Revised manuscript received 8/15/57 


x 
x 
| 
| 
| 
| 


E > 


PSYCHOMETRIKA—VOL. 22, No. 3 
SEPTEMBER, 1957 


THE UPPER AND LOWER TWENTY-SEVEN PER CENT RULE 


Epwarp E. CunETON 


UNIVERSITY OF TENNESSEE 


A simplified re-derivation of the formula underlying the rule is pre- 
sented, followed by a derivation of the comparable rule for the unit- 
rectangular distribution, which turns out to be a 33-per cent rule. Critical 
comments are offered concerning two assumptions: normality of the score 
distribution and equality of mean standard errors of measurement in the 
high and low groups. 


While the use of upper and lower subgroups each containing 27 per cent 
of the total group is quite common in item analysis, it is interesting to note 
that Kelley's original proof [1, 2] has not been examined critically, and so 
far as the writer is aware, the derivation does not appear in any textbooks 
on psychological and educational statistics other than Kelley’s ([3], рр. 300- 
301). A simplified derivation is offered herewith, followed by a few critical 
comments. 

Let s be the standard error of measurement of the criterion scores (the 
Standard response error of a single score). It will appear later that this 
Standard error of measurement may be either that appropriate to raw scores 
or that appropriate to regressed scores without altering the argument. Then 
the standard response error of the mean of all cases in one subgroup will 
be s/ V q for q cases in the subgroup. In particular, the unit-normal distri- 
bution contains one case in the total group, and g, the proportion in one 
tail, is fractional without affecting the validity of the previous statement. 

In the unit-normal distribution, the distance from the mean of the 
whole distribution to the mean of one tail is 2/0, z being the unit-normal 
ordinate at the baseline point x which separates the tail from the rest of 
the distribution. The distance from the mean of the lower tail to the mean 
of the upper tail is therefore 2 z/q for symmetrical tails, and the standard 
response error of this difference is s \/ 2/ф. The critical ratio is 


2/45 νὰ 2. 


(0 Che εν 5 να 


The problem is to maximize CR by choice of x (and hence of q and of z, both 
of which are functionally related to z). The factor /2/s is assumed not to 


. 293 


294 PSYCHOMETRIKA 


vary with z (or q or z), so the problem reduces to maximizing 


(2) f = z/ Vq, 
and the formula defining s does not enter. 
Cs el. | ae а) 
m de γφλ ~ 2q dz 
Now 
z = (Vie; 
de 00 ود‎ 
dro уте 5 
and from (4), 
© Z La. 
Also 


а= 0/Ə [^ e ax, 


where X is any baseline value from * tO c, Then at X = 


x, by the funda- 
mental theorem of the calculus, 


and from (4), 
d 
(6) - 


Substituting from (5) and (6) in (3), 


πια z) 
(7) d^ WU + 25 
The condition for a maximum, which is obtained by Setting the derivative 
equal to zero, is then 22/24 = xz, or 


(8) 2/qux = 9. 


The Kelley-Wood table of the normal probability integral gives values 
ct 


of z and z correct to six decimal places for argument g correct to three 
decimals, and we find the following adjacent entries: 


q x z z/dx 
-270 — .612,813 330.016 1.99835 
.271 “609,701. 881,257 — &.onag, 


EDWARD E. CURETON 295 


The values of 2/02 are computed from the tabled values of q, t, and z. Linear 
interpolation yields ç = .270267 at 2/q» = 2, and the rounded figure .27 
15 actually correct to three decimals. 

The assumption that the distribution of criterion scores is normal is 
well known. Moderate departures from symmetry should have little effect, 
but platykurtosis, which is usually found in distributions of scores on ex- 
perimental tests (when we use the criterion of internal consisteney), and 
is to be expected [4], might change the pieture appreciably. Consider the 
extreme case: the unit-rectangular distribution. This distribution, like the 
unit-normal distribution, is defined as having area = 1 апіс = σ᾽ = 1. Its 
ordinate z is constant, and its baseline has finite limites X + a, equidistant 
from X — 0, For one-half the distribution, the area (the total frequency) is 


1 [Г zax, 


and the second moment, corresponding to (1/2) У(Х — X)*/N in a discrete 
distribution, is 
| хах. 
0 


The standard deviation is therefore 


| х dX 
шта 


gc 


Since z isa constant, 


b i ` ах 


Hence σ = 0° = а/а a’/3, and since ¢ = c? = 1, а? = 3, and 


(9 a= УЗ. 
Also, since the area is unity, az = 1/2, and 
(10) "T 


2۷ 
The distance from the mean of the whole distribution to the mean of one 
tail is (z + a)/2, where z is again the baseline point separating the tail from 
the rest of the distribution, and the distance between the means of the two 
symmetrical tails is 2 + a. This distance has standard response error 5 V/2/q 
as before, and д = (a — х). The critical ratio is then 


(1) CR = (x + a) Via — a /s V2, 


296 PSYCHOMETRIKA 


and since both s and z are constants, the function to be maximized is 
(2) J = G + a) Ма — =. 

df _ —( + a/2Va-a+ xa = z, 
and setting this derivative equal to zero, 


(13) х= 4/8. 


Then from (9), z = 1/4/3, and since q = (d — z)z, and from (10), 
z = 1/2 V3, 


(14) q = (УЗ — 1/V30/2 V3) = 1/3. 
For moderately platykurtie distributions of criterion scores, therefore, the 
subgroups should probably consist of the upper and lower 29 or 30 per cent 
of the total group. ў, 

Моге serious, probably, than the assumption of normality is the assump- 
tion that the standard error of measurement does not vary with score level, 
which in this case reduces to the assumption that the mean value of s is the 
same in the two symmetrical tails. When we use the criterion of internal 
consistency, with an experimental test made up of recognition-type items 
rather than recall or free-answer items, it is well known that moderately easy 
items tend to exhibit higher internal consistency than do moderately hard 
items. It is fairly reasonable to assume that the error variance is about the 
same in the low group for easy items as in the high group for hard items. But 
the easy items will contribute little additional error variance in the high 
group, since almost everyone in that group will know the answers to most 
of them and will not have to guess, while the hard items will contribute 
much additional error variance in the low group. It is probable, therefore, 
that the mean value of s will be greater in the low group than in the high 
group. In this case the maximum CR will probably require more cases in med 

'low group than in the high group, but how many more is a problem requiring 
further investigation. 


REFERENCES 


[1] Kelley, T. L. Footnote 11 to Jensen, M, В., Objective differe; 


^ ntiation bet) 
groups in education (teachers, research workers, ween three 


and administrators). Genet. 
Monogr., 1928, 3, No. 5, 361. ). Genet. Psychol. 
[2] Kelley, T. L. The selection of upper and lower Eroups for the validation of i 
7, educ, Psychol., 1939, 30, 17-24. n of test items. 


[3] Kelley, T. L. Fundamentals of statistics. Harvard Univ. Press, 1947. 
[4] Lord, F. M. The relation of test score to the trait underlying the test. Educ. ps: 
Measmt, 1953, 13, 517-49. Psychol. 


Manuscript received 11/5/56 
Revised manuscript received 12/26/56 


BOOK REVIEWS 


J. Raymonp Gi n Object 
MON ERBERICH. Specime jectiv 
л єсїїгє Test Items—A Guide to Achievement Test 
Construction. New York: Longmans, Green and Co., 1956. Pp. ix + 436 


This book presen! 
and other test рос Mis ран to е problem оѓ helping class-room teachers 
ο ай m prove their tests. It concentrates primarily on item writing, 
types, and Den 1и i ms Mm writers by providing models—a wide selection of τόπον 
ме at all анаи sa ka s ав: from testa in ol subject matter fields, 

his Ө з basie emphasis оп the im] : m 

Ἐν pronun of pente ру inpr с 
prany function of this е стеб hu з быс пеша The 

ome коюш important outeomes of wr н i development of items which measure 
procedhives fa pts bapti in four parts. Part I devotes 19 pages to à brief outline of 
жега frons ЫШ n ievement test construction. Part II, in 195 pages, presents 227 
шев of instructi ὡς hed tests. These excerpts are organized around ten important out- 
Part LII, in 137 ion such as skills, knowledges, appreciations, attitudes, and adjustments. 
by form, type io ἘΝ a variety of schemes for classifying these sample BU 
inte ορ a XE i ie Ча, by school subject andlevel. The items are also cross classified 
by pupil activity and Fs d dee and learning outcome, by form and pupil stimulus, and 
techniques such as ales Part IV, in 24 pages, considers briefly a variety of special test 
test tools ind UM essay examinations, performance tests, and non- 

striki 
of ες valuable feature of the book is the extensive and well-organized lists 
journal artioli bona approximately 80 pages and include over 1,000 references to 
D asa anos nad Ú n 5. "There is also a glossary of over 300 terms, emphasizing primarily 
Gaskets е escribe various forms of objective test items. 
ment of a V ram problems faced by any compiler of sample test items is the develop- 
абас of М μη for organizing the items. An ideal system would minimize the 
Professor ЖЫЛЫ ication and maximize the ease of locating any desired specimen. 
primary basis fecti im wisely chosen to emphasize major learning outcomes ΒΒ the 
these outcomes, 3 Эй ying these sample items. But some of the terms used to identify 
to mean quite ρον which teachers use frequently with apparent understanding, seem 
begins each ch i erent things to different people. To deal with this problem the author 
definitions of is on a particular learning outcome by citing one or more authoritative 
term, they Vis ος While these definitions shed some light on the meaning of the 
seldom provid MR lefine it in the sense of setting up precise limits to its meaning. They 
one ait e criteria which can be used to classify a given test item definitely as measuring 
Е learning outeome rather than some other. 

present js a result is that one finds items which seem almost jdentical in the task they 
excepi he examinee classified under entirely different learning outcomes. F 

ας bs 107 both require the examinee to i horthand symbols, but the 
SRI x ied as a measure of skill, while the second is classified as 2 mensure of under- 
i5 ДЕ xeerpts 1 and 69 both require i istinguish complete sentences 
with bue n sentences, yet 1 is classified with measures of skills, whereas 69 is classified 
Rede of concepts. Excerpts 7 and 159 both require the examinee to indicate what 
mu ion is needed in a sentence, but excerpt 7 is classified with the skill items while 

rpt 159 is classified with items measuring applications. 
ness of the category concepts. 


The difficulty here seems to lie chiefly with the vague 
297 


298 PSYCHOMETRIKA 


Perhaps it would be fruitful to try building a system for organizing sample test item 
on the basis of these conventional labels for supposed learning 
basis of the tasks they present to the examinee. One might the 
bearing descriptive labels like “interpretation of symbols, 3 3 meani 
“recall of factual information,” “ability to solve numerical problems,’ and “ability to 
make correct decisions in practical problem situations.” These might provide a scheme for 
a somewhat more clearly determinate classification of test items. 


S, not 
outcomes, but rather on the 
n arrive at a set of categories 
^ "knowledge of word meanings," 


onse items. Five-response items are treated 
ich the responses are 
are phrases, or numbers, or 
ars to be devoted to matters 


was deliberately intended, since Part II 

was organized primarily on the basi i i as the impression that 
ried capabilities of the 

e items widely used in classroom tests and in the better 
modern standardized tests of achievement. In this collection of 227 sample items there 
appear to be only eight straightforward tiple choice type items. Further, 
there appears to be no example of i ice item consisting of a direct question 


Ee of word meanings, 


of facts, of laws and principles, and of explan; to solve problems or make 


practical decisions. 
What has just been said may Suggest that if the reviewer had written this book he 
would have written it somewhat differently. This is true, But the aim and method of the 


book would have been essentially the same, And, one should not forget, Professor Gerberich 
has actually produced a book while the reviewer has not. It is a good book, one we h 
needed very much. I commend it to the classroom teachers for whom it w 
written, and also to test specialists. They too will find it a valuable reference, 


ave 
as primarily 
Iowa City, Iowa Robert L. Ebel 


ΒΙΡΝΕΥ ΒΙΡΟΒΙ,, Nonparametric Statistics for the Behavioral Sciences. New York: MeGraw- 
Hill Book Company, 1956. Pp. xvii 4- 312 


First events are always difficult to judge since there is т 
ence. So it is with Siegel’s Nonparametric Statistics, which ati 
first time the more important distribution-free Statisties into 
on statistics, it is a model of organization, 
The author takes the reader through logic. 
alternative to the final step of making 
seeing a book on statistics, for the rese 
and large sins of omission rather than of commission, 

I said that Siegel purports to pull together the more 
statistics along with the tables necessary for de 
an author's prerogative to choose as he pleases, 

quite nicely. However, it seems to this reviewer 


e result. I cannot recall ever 
atic as this one. Its faults are by 
important distribution-free 
termining their significances. He claims 
and in general he has covered the area 
astonishing that Marshall’s test, Tukey’s 


BOOK REVIEWS 299 


corner test, the Mood-Brown distribution-free analysis of variance tests, all nonparametrie 
tests of interaction (except by accident), Wilcoxon’s T-test for groups of unreplicated 
samples, nonparametric tests of trend, and multi-dimensional x* are omitted. Siegel does 
have a footnote to the effect that K. V. Wilson’s article on multi-dimensional x° came out 
after the book was written, but this seems a poor excuse since there are discussions of this 
problem in such places as Mood and Rao as well as others, and it is a natural generalization 
of simple x? problems. Hypotheses about interaction occur often in psychological research; 
it seems extremely important to include some of the nonparametric tests of interaction. It 
also seems important that extensions to more complex designs be given, and this involves 
? as the principle tool. 

It is also surprising that the use of nonparametries in estimating population param- 
eters, e.g., rank percentile levels, was not included. This omission could be justified easily 
on the basis of space requirements, but such an omission of estimation procedures does 

weaken” the usefulness of the book as a textbook. 

There are at least three points in the book which can mislead people. First, the 
Mann-Whitney U-test is only a test of location when the two samples being studied come 
from populations with identical shapes. In fact, this test is a test of rapidity of build-up 
from a specified direction. If one population is extremely skewed and the other not, it is 
Possible to obtain a significant-U though in fact the two populations do not differ in location. 
The only secondary source which makes this point clear is Keith Smith’s chapter in Festinger 
and Katz. 

Second, it is easy to overinterpret the author's αἶνος 
over the parametric type, in spite of some warnings he gives. (This has been borne out in 
Several discussions this reviewer has had with several researchers who have been using this 
book.) That is, nowhere does the author mention" transformations which normalize non- 
normal distributions. Since parametric statistics are always more powerful than non- 
Parametric statistics if the distributions are normal, it would seem wise to point out the 
Possibilities of such transformations. For example, even though one has a small sample of 
Subjects in a given experiment dealing with reaction time, he should know—from the 
accumulated research of others—that a logarithmic transformation will normalize the 
distribution of his scores and is to be preferred over a less powerful nonparametric statistic. 
One gets the impression that one always has to estimate the shape of the population 
distribution of scores from one’s own sample, which in fact may be quite small. There are 
Many instances in psychological and social science research where there isa considerable 
backlog of information concerning score distributions and such information should be 
utilized in making a decision as to whether to transform (normalize) the distribution of 
Scores and use parametric tests or whether to use only nonparametrics. 3 Ж: 

The third point also stems from Siegel's enthusiasm for nonparametric statistics. 
It concerns his strictures concerning scales of measurements. There are many statisticians 
who do not worry about whether they are using an ordinal or an equal interval or a ratio 
type scale so long as the distributions are approximately normal in shape. As yet non- 
Darametries are extremely limited in their application; they do not exist 5 most multi- 
variate problems. It is easy to be misled by Siegel's discussions of scales 0! i E eg с 
into being too cautious. Frankly, we do not know as yet what effect the type of scale has 
on W А mem YE cere. irical investigations show little or no effect. 

parametric statistics, if any. Most empii 5 προς 
he reader needs to be cautious about indiscriminately abandoning parametric statistics 
solely on the basis of scales of measurement. 
this book is an excellently ori 
rofitably used. Though many 
mitations ап 


acy of nonparametric statistics 


ganized presentation with many valuable 
worthwhile distribution-free tests 
d I can only say “you ought to 


In summary, 
Statisties which can be p b ca 
are omitted, its virtues far outweigh its li 


have it; it is well worth while." 
Charles M. Solley 


The Menninger Foundation 


800 PSYCHOMETRIKA 


Boss, R. C., Силтуговтнү, W. H., and ΒΗΠΙΚΑΝΡΕ, S. S. Tables of Partially Balanced 
Designs with Two Associate Classes. Raleigh, N. C.: Institute of Statisties of the 
Consolidated University of North Carolina; N. C. Agricultural Experiment Station 
"Tech. Bull. No. 107, 1954. Pp. iv 4- 255. 


Binet, Е. E., Іезив, В. T., Werner, S., and ANDERSON, R. L. Analysis of Confounded 
Factorial Experiments in Single Replications, Raleigh, N. C.: Institute of Statistics 
of the Consolidated University of North Carolina; N. C. Agrieultural Experiment 
Station Tech. Bull. No. 113, 1955. Pp. 64. 


The first monograph under review (Bose εἰ al.) is devoted to the partially balanced 
incomplete block designs. These are members of a class of designs in which each experi- 


experimenter more Variety and greater freedom i 
treatments, replications, etc., are concerned. The: 


psychology, partieularly when the analysis i 


tensively cultivated as learning and perception ex e ana nas: 
Von HA ere Bo for feom standardized that, an investigator frequently has no idea what 
his results will look like beyond certain gro. 

squares. He is rarely prepared to interpret or to rationalize all t| 
components. Psychologists who have learned through considerah 
expect from their techniques and their subjects, and Who are inter 
hypotheses for which these methods are appropriate, however, w 
useful guide. 


Washington, D. C. 


PSYCHOMETRIKA—VOL, 22, NO. 4 
DECEMBER, 1957 


NEW PROBLEMS FOR OLD SOLUTIONS* 


Нувевт E. Вкосрех 


PERSONNEL RESEARCH ΒΠΑΝΟΗ 


THE ADJUTANT GENERAL S orricET 


Some of the methodologies that have become standard tools in psycho- 
metrics suffer from neglect. They are taken too much for granted and are not 
given the attention that seems appropriate to the important role they play 
in research advances. I propose to make some suggestions which may, in 
a modest way, assist in alleviating this difficulty. In a very general way I 
would like to suggest that effort expended in examining a variety of restate- 
ments of a methodological problem may lead to new methodologies of real 
value. 

In a sense then, I am suggesting that we look for new problems in areas 
where old solutions are available, or possibly, for problem restatements 
where the application of an old solution may have become too automatic 
and too uncritical. 

Since a discussion of all of this in general terms would tend toward 
triteness, I plan to proceed in part through the use of examples with the 
thought in mind that a number of such examples (bearing upon similar 
problems) may be of some additional value in suggesting a generalized ap- 
proach to certain classes of problems. For simplicity, I will avoid problems 
associated with sampling error and limit the discussion throughout the paper 
to cases involving very large samples. . 

Απ article by Guilford and Michael entitled Approaches to Univocal 
Factor Scores" illustrates the kind of problem restatement that I have in 
mind. A solution to the problem of estimating factor scores, providing esti- 
mates that are best in the least square sense, has been available for some time. 
However, it has been frequently observed that the least squares estimates of 
orthogonal factors tend to intercorrelate substantially. Having in mind, 
possibly, this apparent defect of scores estimated through the least ο ορ 
method, Guilford and Michael suggest as an alternative approach scoring 
or weighting procedures designed to yield a univocal factor score—a score 
having variance in only one common factor, its remaining variance being 


š т 7 
* i ial Address to the Psychometric Society, September 4, 1957. 
Pe adus expressed are Those of the author and do not reflect official Department 


of the Army policy. 
301 


& 


302 PSYCHOMETRIKA 


attributable to errors of measurement plus possible specific variance. This 
restatement emphasizes the reduction of bias or contamination and relegates 
accuracy of measurement to a secondary role. 

While I do not wish to consider this problem in greater detail or attempt 
to describe the solution, I do have a further point regarding the justification 
of the problem restatement, which is pertinent here and will be pertinent in 
the examples to be discussed later. In considering possible methods for 
estimating factor scores, an early question might well 
factor scores be used, or what kinds of conclusions will b 
kinds of decisions will be made as a result of their use? 
that the statement of the problem should be phrased so 
likelihood that such conclusions or decisions will be correct. This may often 
lead to a number of alternate problem statements since the same problem 
Statement may not permit correct conclusions in research studies with various 
objectives. 

Let us consider, for a moment, the factor 
these questions. I realize that many investigator: 


be—how will the 
e drawn and what 
It seems desirable 
às to maximize the 


nsistent with the use of the 
for example, to report estimates 
cts taking a set of tests. " "o illus- 


onclusions or decisions in the statement of the 
problem, let us consider a class of Studies in which the investigat 


to extend his knowledge of a set of factors by relating the factor se 
mates to a set of new variables. This will permit us to relate the co 
resulting from such a study to the statement of the problem, 

An investigator with such à purpose in mind will draw no conclusions 


Or seeks 
ore esti- 
nelusions 


adversely affected. 


A consideration of the problem of estimating a composite criterion 
score from unreliable components will provide a further opportunity to 
illustrate the kind of problem restatement I have in mind. Suppose that а 
set of criterion components are available which are deficient only in that 
error of measurement is present. The problem, in general, is the selection of 


weights for the components that will yield the best estimate of the true 
composite. 


HUBERT E. BROGDEN 303 


A least squares solution to this problem is possible and has often been 
proposed as best. If, however, we reconsider the statement of the problem 
and examine particularly the nature of the conclusions to be drawn when the 
estimate of the criterion composite is used for validation research, it can be 
shown that the least squares solution is irrelevant and gives weights nega- 
tively related to those provided by the proper solution. 

Assuming that the criterion composite is to be used for validation studies, 
the validity coefficients or the partial regression weights of the predictors 
are of central importance, since these are basic to the conclusions deriving 
from the validation study. The following restatement of the problem is 
suggested after considering the intended use of the criterion: what set of 
weights for the fallible criterion components will insure that, in a later vali- 
dation study, the validity coefficients (or the partial regression weights) 
obtained against the estimated composite will be tlie same as those that would 
be obtained if the true composite were available? 

R. H. Gaylord and I have considered this problem and a solution has been 
achieved. An interesting aspect of the solution is the relationship between 
the magnitude of the weights and the reliability of the components—the less 
reliable the component the larger the weight, when the components are in 
standard score form. To understand this point note that—when the compo- 
nents have unit variance—if more error variance is present less true score 
variance will remain. Hence, a heavier weight is needed if the true score 
Variance of an unreliable component is to be proportionately represented in 


the over-all criterion composite. Ye 
With a least squares solution the opposite is true: the greater the error 


in the criterion component the lower the least squares weight. The least 


squares methods yield a composite that has maximum correlation with the 
true composite but which is biased for the purpose of validation research. 

The approaches to this problem and to the problem of factor score 
estimation have much in common. In each instance the least squares solution 
was, perhaps, the more obvious one. The problem restatement could in each 
case be derived from an examination of the way in which the solution would 
affect the conclusions of research studies in which it was applied. There isa 
further point of similarity. The criterion estimation problem might have 
been stated: what estimated criterion composite will be collinear with the 
true composite? 

A restatement of the problem of item di 


still further light on the general point I have in : zm 
gators have in: various ways studied the relation of tests to underlying “mal 
and have shown for various conditions the characteristics of itema an. a 
difficulty distributions that will yield the most emim i m 
underlying ability. While efficiency of measurement T еч pura 
number of ways, all of the definitions resemble the least sq 


fficulty distribution may throw 
mind. A number of investi- 


904 PSYCHOMETRIKA 


in that all are concerned with maximizing some index of the degree of relation- 
ship between the test and underlying ability. Ferguson and others have 
discussed the problem associated with diffieulty factors and have stressed the 
way in which correlations among tests may be distorted as a function of 
similarity or dissimilarity in the difficulty distribution of the items 
Since the phenomenon of difficulty bias in correlations is basic to the 
problem restatement I wish to propose, further explanation of this bias seems 
desirable at this point. It is well known that if the p-values of two dichoto- 
mous variables are similar, the phi coefficient will tend to be low and this 
will hold although the tetrachorie correlations between 
are equal. Now, with test scores the same phenomenon is evident, particu- 
larly if the tests are homogeneous in difficulty. Two tests, each homogeneous 
in difficulty, will correlate more highly if the difficulty level of the two tests 
is approximately the same than if the difficulty levels are divergent. Many 
test types proposed as efficient measures of underlying ability in the least 
Squares sense have items homogeneous in difficulty. Hence, the correlations 
among such tests and between such tests and other variables are also subject 


to diffieulty bias—probably more so than with tests having a greater spread 
of item difficulty. 


all pairs of items 


statement is possible and, for certain purp 

Where a contribution to subject matter knowledge is the object of an 
investigation, it seems proper that elimination of bias is all-important and 
reduction of error of measurement is of secondary importance partieularly 
since, with knowledge of error of measurement, methods of estimation are 


often available and appropriate which will make allowance for the attenuat- 
ing effect of error. 


as would underlying ability, regardless of the nat 
If the correlations between the test ar 


S after 
correcting for the effect of error, as the correlations between underlying 
ability and such variables, the conclusions or decisions reached will be the 
same. 


I cannot offer, with proof, an exact statement of the over-all problem 
and an accompanying solution. While the foregoing discussion is possibly 
sufficient in view of the theme and Scope of this paper, a further problem 
statement and a possible solution may be of some interest. The justification 
of these further developments must remain largely intuitive. 


| 


HUBERT E. BROGDEN 305 


A brief indication of the major assumptions and limiting conditions 
may be helpful before continuing with the new problem statement. Obvi- 
ously, in this brief discussion it is not feasible to state these in full. Underlying 
ability is defined as a perfect normally distributed measure of the ability 
common to the dichotomous items. The problem is limited to tests in which 
all items have the same biserial correlation with underlying ability. Ability 
and error are assumed to be the only determiners of the item responses. 

If I rephrase the problem statement and ask: what item difficulty 
distribution will yield a test score such that the bivariate frequency surface 
of the test score and underlying ability is normal, the statement then appears 
to be more precise and seems to be a more feasible starting point in a mathe- 
matical development leading to a demonstrated solution. 

This more precise restatement is, I believe, logically equivalent to the 
prior and more general restatement of the problem. From this second restate- 
ment it follows that the test is a simple linear function of underlying ability 
and error, and that the test is described thus through its entire range—given 
only the product moment correlation between the test and ability. It also 
follows, then, that the correlation between ability and any other variable 
can be estimated, given the correlation between this variable and the test 
and, of course, the correlation between ability and the test. A linear model 
equivalent to that used in factor analysis is applicable and the estimate is 
the product of the above two correlations. It is emphasized that this model 
is believed to hold regardless of difficulty biases that may be present in the 
other variable. Hence, when the bivariate surface and ability is normal, it 
seems reasonable that the test can be used in place of ability and, with correc- 
tion for error of measurement, the conclusions reached through the use of the 
test are the same as those that would be reached had underlying ability been 
available. р 

The item difficulty distribution that I have in mind as a possible solution 
to this problem is perfectly rectilinear, with the item difficulty index expressed 
as baseline values of a normal curve. To achieve this distribution, items 
would be selected with difficulties of 0, +.1, —.1, 4.2, —.2, 4-3, -3, ete. 
In theory, such a distribution would place items at equal difficulty intervals 
ranging from plus infinity to minus infinity. : m 

Now, the general point of this paper had to do with the value of examin- 
ing alternate statements of the problem, and I believe these three examples 
illustrate this general point. I have been developing as à second general point 
the need for examining the conclusions or decisions to be made s 
methodologies under consideration are to be applied, and the e et 
of reasoning from such conclusions to а justification of 5 me — 
This point has been stressed sufficiently and discussed in relation to 


the three examples. 


The distinction between the kind of problem statement that leads to 


906 PSYCHOMETRIKA 


a least squares solution (or something akin to such a solution) and the alter- 
nate problem statements that we have considered deserves some extra 
comment. š 

We have noted that if, in each case, the scores were to be used for practi- 
cal estimation of an individual's standing, the least squares solution would 
likely be satisfactory. 

If interest did not center on direct use of the scores and if the scores 
were to be used as a means of arriving at further conclusions through addi- 
tional research, alternate problem statements pointing toward reduction of 
bias have been suggested as more pertinent and more acceptable. 

The added suggestion I wish to make at this point deviates from the 
central theme of the paper and relates to the above similarities in the three 
examples. I am suggesting merely that the above-noted distinction may 
extend beyond these three examples. In additional problems where the least 
squares solution has been accepted as best, a close examination of the problem 
in relation to the decisions to be made when the resulting method is used may 
again suggest an alternate problem statement. In other words, the particular 
distinctions between the least squares problem Statement and the reduction 
of bias problem statement may have more general value. 

The latter portion of this paper will be directed toward possible sources 
of confusion between different classes of problems or problem statements 
rather than toward restatements of problems as such. 

A very general distinction in the methodologies widely used in psychol- 
ogy is relevant in several ways to the present discussion, although little of 
what I have to say is really new or different. I am speaking of the distinction 
between a correlational approach and approaches primarily based on con- 
trolled experimentation. I would like to discuss these two v 
approaches in relation to classes of practical decisions properly stemming 
from empirical evidence. I am choosing cases involving practical decisions 
so that certain points can be made most clearly, not because the points I 
wish to make are necessarily limited to cases involving practical decisions. 

If the practical decision is a choice between administering or not admin- 
istering a given treatment, it is well recognized that a controlled experiment 
is properly used to demonstrate the effect of the treatment. Knowledge of the 
effect of the treatment then becomes a major factor in deciding whether 
or not the treatment will be used. I have no real comment here. I believe 
that few would hold that a correlational study—without experimental con- 
trols—is proper backing for such a decision. 

While the class of decision for which correlational evidence is appro- 
priate is fairly well recognized in practice, it is somewhat more difficult to 
find a clear statement enjoying widespread agreement in discussion of scientific 
method. I should like to suggest at least one type of practical decision where 
a correlational design is clearly pertinent. I mean, specifically, 


ery general 


a decision to 


HUBERT E. BROGDEN 307 


use or not to use a test or measure for the identification and hiring of personnel. 
With regard to this type of decision, I would like to make two points: 
(1) a correlation design will show what criterion performance can be 
expected from persons with a given test score, thus giving information 
basie to the decision in question, 4 
апа 
(2) а controlled experiment (showing ће relationship between a test апа 
an appropriate eriterion with other variables held constant) may 
suggest but does not demonstrate the value of this independent vari- 
able for selection purposes. 
My point regarding the correlation design should be clear and acceptable 
without elaboration. The second point calls for further discussion. 
A true controlled experiment is in some ways quite meaningless when 
a test of an ability or personality trait is the independent variable. À test 
score cannot be meaningfully manipulated—the individual differences in 
the test score must be taken as they come or created by selection of cases. 
Moreover, experimental controls are difficult to accomplish. Such controls 
must again be achieved by selection of eases. Most important, however, it 
is difficult to define the “other variables" that are to be held constant in 
a controlled experiment. Consider, for example, the consequences of holding 
constant an alternate form of the test used as the independent variable, or 
the consequences of holding constant a number of tests so chosen that the 
common-factor variance of the independent variables will be reduced to zero. 
If we disregarded the problems I have just raised and assume that a 
test has been found to predict a criterion, and that all other variables were 
held constant through selection of cases, an additional difficulty still arises. 
With selection of cases the sample in which this relationship is demonstrated 
can no longer resemble the sample in which the application must take place, 
and the relationship discovered in the validation sample cannot be applied 
with confidence. To further clarify this point, consider the kind of selection 
procedure that is supported by the evidence of a controlled experiment. 
The two steps of the procedure are: (1) selection οἱ 
the effect of the operations used in the validation samp 
variables constant, and (2) within the remaining applica 
those with high scores on the test under investigation. Needless to say, 
two-step procedure is not appropriate to the practical problem. — А 
Although, as I had suggested earlier, the thoughts expressed Pads 
to these two designs are not new, I hope that the examination of the designs 
in relation to the decisions to which they are pertinent may have provided 
some new insight into the distinction between these problems. — it 
A second general distinction between classes of Εθν, in Е 
nection with sealing. Consider, as one problem, the search for units o Pas d 
ment that have the properties of a true scale. Many authors have struggle 


í applicants to duplicate 
le to hold all other 
nts, selection of 
this 


908 PSYCHOMETRIKA 


with this problem as it relates to the general methodology of s 
most, agree to a number of desirable features of so-called true scales. ng 
scales are fairly common in the physical sciences. In psychology, they are 
sought after but rarely achieved. 


A second class of scaling problem is, I believe, distinctly different from 
ssociated with true scales. If the 5ο 

di pada р eory, we are then seeking 
perties necessary to permit the decision 
criterion problems for 
metric and the notion 
behind true scales and 
t а dollar unit is a true 
senses relevant to decisions 


ant to this kind of decision, has no 
lems mentioned earlier. If we con- 


t are all Very easy or all very difficult. In other words, 
We Seek a count of behaviors, and we seek them at a diffieulty level such that 
they can be properly evaluated as representing profits or losses to a decision 


maker—assuming that the purpose of the decision maker is to Maximize 
profits. 


I suspect that this difference in the purposes of scaling—as seen in a 
practical decision problem on the one hand and in the development, of a 
general body of Scientific knowledge on the other—can be differentiated fur- 
ther. I suspect, also, that many investigators have not distinguished between 


these tivo types of scaling problems and that the scales developed may have 
been less adequate or less suitable as à result, 


In summary, let me point again to several 
paper. Let me repeat and emphasize my belief th 
ology, we must closely examine the decisions to 
to be drawn when the methodology is applied. 


of the major points of this 
at, in developing a method- 
be made or the conclusions 


ology, and the chain of reasoning, in my opinion, best 
tion of the decisions to be made to а justific. 

The three examples involving a distinc 
solution and solutions offering measures free 


proceeds from a defini- 
ation of the methodology. 

tion between the least, squares 
of bias have Suggested a second 


HUBERT E. BROGDEN 309 


point. I believe that this distinetion can be usefully applied in other con- 
texts and that some added insight will obtain. 

Finally, I hope that the foregoing has clarified my most general thesis 
and that I have given some support to the notion that effort expended in 
seeking new problem statements can be profitable. 


Manuscript received 9/4/57 


' 
агы 
eam. ro. 
; RE Г 
түт 
y 


3 


PSYCHOMETRIKA—VOL. 22, NO. 4 
DECEMBER, 1957 


OPTIMAL TEST LENGTH FOR MULTIPLE PREDICTION: 
THE GENERAL CASE* 


Pavr Honsr AND CHARLOTTE MacEwan 
UNIVERSITY OF WASHINGTON 

The concepts of differential prediction and multiple absolute prediction 
were developed in earlier papers [2, 3]. Methods for ο αντ dis- 
tribution of testing time for each type of prediction are available [4, 5] and 
are appropriate for use provided that no altered time allotment approaches 
zero. In this article the methods developed in [4, 5] are extended to include 
cases where the altered time allotment for one or more tests may approach 
zero. The procedures developed are illustrated by numerical examples, after 
Which the mathematical rationales are provided. 


In previous publications [2, 3, 4, 5] the problem of maximum validity 
in predieting multiple criteria was approached in two different ways. In [2] 
and [3], for predicting criteria differentially and for multiple absolute pre- 
dietion, respectively, techniques were developed for selecting from a large 
number of potential predictors that subset, of specified size, which yields 
the highest over-all validity as measured by the respective indices of prediction 
efficiency, ф and А. A more general approach was used in [4] and [5], in which 
à procedure previously presented for the case of a single criterion [1] was 
extended to the cases of differential prediction and multiple absolute pre- 
diction, respectively. Here, techniques were presented whereby, starting 
With a given battery of predictors for differential prediction or for multiple 
absolute prediction, one could determine altered administration time allot- 
ments, for any specified over-all testing time, for which the index of prediction 
efficiency (¢ or λ, respectively) would be a maximum. 

The techniques developed in [4] and [5] provide methods of solving for 
optimal test lengths, in terms of time allotments, by series of approximations. 
Since reciprocals of the altered time allotments are involved, the methods do 
not hold in the event that any altered testing time becomes zero. In this 
artiele a modification of procedure, applicable also in the case in which 
the new time allotment for any test approaches zero, is presented. 

A numerical example for the case of differential prediction, and a sum- 
mary for the case of multiple absolute prediction follow in the next section. 
The mathematical basis for the procedures described for the general case is 


presented in the final section. M 
*This research was carried out under Contract Nonr-477 (08) hetweon is κα, 
of Washington and the Office of Naval Research. The authors LET heir appi a 
to Shun Mei Ling for carrying out the computations, and to Elizabeth 
the manuscript for publication. 
311 


312 PSYCHOMETRIKA 
Numerical Examples 


The General Case for Differential Prediction 


The example below demonstrates a modification of the computational 
procedure presented in [4] such that its applicability is perfectly general. 
The assumptions stated for the more restricted case [1, 4] also apply in the 
general case but will not be repeated here. 

The data used in this example are those used in [4]. The matrix of test 
intercorrelations with reliabilities in the diagonal is shown in Table 1. Criterion 
variables are grade-point averages in each of ten college areas. The matrix 
of validity coefficients is shown in Table 2. 

Over-all testing time for the tests of arbitrary length is 142 minutes. 
Assume, as was the case for the example in [4], that the total testing time is to 
be cut in half, that is, to 71 minutes. The problem is to determine time 
allotments for the various tests such that the resulting index of differential 
prediction efficiency is maximized. The following method of solution employs 
a series of approximations differing from that presented in [4]—with the 
exception of the first iteration, no reciprocals of the altered time allotments 
are involved. 

It will be demonstrated that the results obtained by the original and the 
modified procedures are, for practical purposes, virtually identical. Since we 
start with no near-zero test lengths, the somewhat shorter method, as de- 
scribed by steps 1-8f in [4] may be used to obtain the second approximation 
to optimal test lengths. In brief, by these steps we determine: 

1. The а; matrix shown in Table 3. This is obtained from Table 2 by 
subtracting the mean of column ¿ from each element in column i. 

2. The elements in the diagonal matrix, A, shown in row 2 of Table 4. 
Each element is the original test length, given in row 1 of Table 4, multiplied 
by the corresponding unreliability. 

3. The first approximation to the altered test ] 
test length cut in half as shown in row 3 of Table 4. 

4—5. The values shown in row 4 of Table 4. E 
Table 4 is divided by the corresponding e 

6-7. The matrix L, . To compute the 
as follows: Using the R matrix of Table 1, the value of each diagonal element 
is inereased by adding to it the value of the corresponding element in row 4 of 
Table 4. For example, the first diagonal element of the new matrix is «920 + 
100 = 1.080. The L, matrix is obtained by premultiplying the matrix ας 

of Table 3 by the inverse of the augmented R matrix. The procedure for 
premultiplying a matrix by the inverse of a symmetric matrix is outlined in [6]. 
The solution is found in two stages, the “forward solution" and the “back- 
ward solution," both of which may be seen in [4]. In this report only the 


engths. Assume each 


ach element in row 2 of 
lement in row 3 of Table 4. 
matrix L, , first make up a matrix 


T πας. ттт үе етинин ыыы ы э „э чы U e 


. q ποο” TOO? του» ΠΟΟ"- .- . 
N° T 2" E’ E" oor 097" οοτ «αντ " . . A : του £oo 1 
> ч е : fa ὡς = = noo To= EO? πο 
OT of £L «τι ost Sy Stat Ἱατῆ- τε Pee Ме eee a το 
O B . e š 8 0° OT 
OT4’LT — 009'S 056: οητ'η ooz 021, 0002 τ г sean ee Lad S. ; 7 t6o* 6 
* ^ 4 ° 090°- 9 
ozat оо ost оа oot ος οἱ "ai 1 к e у Eo im zéo L 
£60* εττ.- T90°- στ.- i“ IT- 9 
ке TOE I UA eof we i и ἱ 
050*- TTO" -~ O20*- 900* ετο:- 900° - ς 
290" ότο" gro’ то” 060° 0£0*- г 
ç, T Pus ‘saznuTW UT suq3uəT 970° 100° - EEO’ 690° 1π0-- £20" τ 
159], Teut3do o4 suorqgturrxorddy 35114 JO J10329A MOY Ü € y шп, ue 


Таа τ əy} рив 'su43guoT 1591 TšSUT3ƏTIO JO 10159Λλ MOY k τ ou] 
1591, u»v4 лој urio3 uo 
ү TIAVG J 4 GOTq%ƏTAƏQ ur 


pəssəzdxg s30912733202) AITPTTeA :χταη5ῃ OD ƏQL 


£ ππανα, 
[n] 
T 
[n^] 
— n  . 
266'1 ELE’ Gnt * 192° zoo гг" Lmt* οτ/τ HT hT gese 02°С 9с  l192'l οβ6τ o6L'z 4 
е6 * GT ELE SHE TOE οτο' бг εἰς 7 ые —_———єЄ————— —- 
die 098° 8297" 9° — OSU- ΕΠ στ ποττβαπ ο 
JT тё б gé το’ οτε: θε’ £3oTooZ OT ο ον EE ΕΕ =“ επ o т-яду ἆ 
(б\т бор борт б 960°  OLT° Оң" ABoTOYDKsA 6 poss τ, бно" 0650: ооё" 65" τρ O-HOV + 
s£9*1 тон" | ot£* ος’: J ggo'- ο Lez’ зотзсшәцуси 9 Zeer pe’ emt’ оо 026 E00" гест: L 2-5 € 
θε T ott* Εἰς" egt του- бт 86" Клозѕтн ) 280" μα 262" 69E εοο ος: (66r € 2-5 2 
TEO'T το. бег" Ол” 460° ONT’ απ. ΑΏοτοοῦῃ 9 0622 «τς εο τοῦ: gme 661. 026° т 2-9 τ 
+T `T 92% eke" ooz’ 9ST°- 188” 662° '2uv] идтәлој С س‎ — — ———————— 
2L6*1 πο 881" ο ο де” 926° ysTTaug + τ 9 ς η ς 2 t 
9St ° T tat" HEE’ της) 800’ +P θες" sopuouoog £ 
619*1 66€ * qot ° 60€ ° 9TO* hlz’ LETE” Клаѕтшәчо 2 ————  @P-_——,— 
o£9*t LSE” THE’ ti62* T60° ЫТ? 016° АЗотойолцзоу T ü 
α-α = y 


4 HST fay  t-s0v 12-0 € 2-9 τζ-ο 


9 6 " t с 8 :[9u029TQ 911 UF 89τΊταῃ лој pəqn4rTqsqns 


SƏT3TTTQƏTTƏH U3T^ 5потувтәлдооләтит αογοτρθαᾷ JO XII439W H 911, 
$303T7273J202 AYTPTTVA JO хтлэзи °л aul 


I παν” 


c ΣΕΥ 


814 PSYCHOMETRIKA 


backward solution, in Table 5, showing the transpose of the L, matrix in 
the upper left section is reproduced. 

8. The second approximation to the altered test lengths. The compu- 
tational procedure is that used in [4] and is shown in rows a through in 
Table 5. 

Row a consists of the sums of squares of column elements of the Ly 
matrix. For example, the first element in row a, .0626, is the sum of squares 
of the first 10 elements in column 1 of Table 5. 

Row b is copied from row 2 of Table 4. 

Row ο consists of the products of corresponding elements in the two 
preceding rows. For example, the first element in row c, .1251, is .0626 2.00. 
(In the original computations, six decimals were retained in the elements 
of row a.) 

Row d consists of the square roots of the corresponding elements in the 
preceding row. For example, the first element is 4/.1951 = .3537. The 
value of s, as seen to the right of this row, is computed as the over-all new 

testing time, 71 minutes, divided by the sum of elements in row d, 1.8823. 
The quotient is 37.7198. 

Row e gives a check on the computations for row d. Each element in 
row c is divided by the corresponding element in row d. Thus, .1251/.3537 = 
9537. 

Row f has as elements the seeond approximations to optimal test lengths. 
"These values are found by multiplying each element in row d by the obtained 
value of s. Thus, for the first element, .3537 Χ 37.7198 = 13.3415. Summed, 
the values in row f should equal 71, the over-all new testing time in minutes. 

Since there are no near-zero values in row f, normally one would continue 
in the manner described in [4] to obtain the third approximation to altered test 
lengths; i.e., in terms of the present report, substitute the values in row ή 
of Table 5 for those in row 3 of Table 4, and repeat steps 4-5 through 8f 
to compute the third approximation to optimal test lengths. 

Assume, on the contrary, that some test length as given in row 1 of 
Table 5 were near-zero or zero. Under these conditions, it would be difficult 
or impossible, in the succeeding iteration to carry out the computations 
indieated in step 4-5. The modified procedure described below avoids such 
an impasse. This procedure may be employed with complete generality. 
The calculations in Table 5 are completed as follows, 

Row g consists of the square roots of the corresponding elements in the 
preceding row. Thus, for the first element, 4/13.3415 = 3.6526. 

Row h gives a check on the caleulation of row g. Each element in row f 
is divided by the corresponding element in row g. For example, 13.3415/3.6526 
= 3.6526. 

The elements of row g will be used subsequently for a number of opera- 
tions. 


315 


PAUL HORST AND CHARLOTTE MACEWAN 


—I τι ee ———À——————— ——————————— 


трт 
42349 


fegg't 


TH LT 
3 


ёа 5 
OGLL'€ TENSE TSHS*E QTE TREE 969'E a. d T :90 


[2 
O6LL't OENS*E eGuG't LUG't тоне 9069" рат 


Logez*st TESS*2T eg96*et GISL'ZT G66n*G 
9516" 


6ett* ZEEE” 


z 
STREET qo Туут = ax 


eget" өт LESE” " (т Tay, I + 539 
9glt*  gett' ЕЕЕ" ВС" GT’ σε" КЭА: 
EET’ ϱοττ ОПТ” htt’. тео τό y πα 
ο σε «τη орг εἰ” ους ν,τ 
9620 Erot 6920" Liyo 6620 9290 + ттщ ^ 


εεο'- 600" OLO’ OTO’  Teo'"- Soor- 
ετο 900" оо"  glo' 190°- πο" 
Too’ gto- ego" lgo- гт g90`- 
€o 2807" los TWO 6£0°- 220° 
660*- gSo° 060" тот, OO” 6ST*- 
тот οοτ’- 920"- OTT’ 660 teo’ 
£90’ 9507 go SLo- σοο- от” 
950'*- 020" Ἅοο'- 600°- noo- 800” 
140” осо" те το бо SLor- 


+10" τορ'- 


q 


u 


ч mar no t- ^9 


xw," 


sua?usT 1591, T9ur3d0 o3 uoj4wz;xoiddy puosag 304 20 σπογηπηπάπορ 


Pus “(gç *d ^w) шолу (поруптов pranova) Ty = ^» Ж r= σφι) зо ποτηηπάποο 


< ну 


816 PSYCHOMETRIKA 


9. The matrix shown in Table 6 is calculated next. Each column of 
Table 6 is obtained by multiplying each element in the corresponding column 
of the R matrix shown in Table 1 by the corresponding element in row g. 
For example, for the first column of Table 6, the first two elements are: 
.920 х 3.6526 = 3.3604; .159 X 3.6526 = .5808. 


TABLE 6 


The mé Matrix 


1 2 3 4 5 6 Σ 
1 3.360} — .3729 «59 .9962 2.7033 1.9462 9.9219 
ë 5808 2.1575  .0107 1.3082 1.0316  .9183 6.0101 
3 +5552 .0070 3.2860 .7090 +5031  .5669 4.4934 
h 1.0260  .86h3 «ΤΙ 2.9071 1.9051 1.6099 9.0681 
5 2.7869 .68%8 „5072 1.9863 2.9407 2.3732 11.2391 
6 1.8811  .5699 -.5358 1.5103 2.2250 3.2499 8.9004 
Σ 


10.1908 4.6574 1.5253 9.3771 11.3518 9.5306 19.6330 
к — 10.1908 4.6574 4.5253 9.3771 11.3518 9.5306 19.6330 


a 


TABLE 7 


The οἱ rot Matrix 
9 ο 


3 5 6 Σ Ck 


12.27} 1.362 1.983 3.639 9.87} 7.109 36.241 36,241 
1.362 5.060 .025 3.068 2.426 2,15% 11.095 14.094 
1.983 «005 11.737 2.532 1.797 -2.025 16.049 16.019 
3.639 3.068 2.532 10.306 6.896 2-707 32.108 32,148 
9.874 2.026 1.797 6.896 10.419 8,108 39.820 39.820 
7.109 2.150 -2.025 5.707 8.408 12.281 33.6% 33.635 


2. 
2 
3 
4 
5 
6 

10. Computed next is the matrix found in 
obtained by multiplying eaeh element in the 
computed in step 9 by the corresponding el 
elements one and two of row 1 of Table 7 
.9729 X 3.6526 = 1.362. 

11. Caleulate a matrix which shall be designated A, . The A values 
found in row 2 of Table 4 are added to the corresponding diagonal elements 
of the table obtained in step 10, and the resulting matrix is copied into the 
upper left quadrant of Table 8. The first diagonal element of Table 8 is 
2.00 + 12.274 = 14.274. Note that the elements below the diagonal are not 
copied in. | 

12. 'The diagonal elements in the upper right quadrant of Table 8 are 
the corresponding elements of row g. 

13. Next compute the inverse of the A, matrix, postmultiplied by the 


Table 7. Each row of Table 7 is 
corresponding row of the table 
ement in row g. For example, 
аге: 3.3604 X 3.6526 = 12.274; 


eee SEE 
; _  — n ph 


PAUL HORST AND CHARLOTTE MACEWAN 


317 


diagonal matrix in the upper right quadrant of Table 8. The procedure used is 
identical with that previously mentioned in connection with computing L, , 
and is outlined in ([6], Ch. 21, Sec. 7). Computations for the forward solution 
are shown in the lower quadrants of Tables 8 and 9. The backward solution 
is shown in Table 10, which gives the transpose of the desired product matrix 


in the first six columns. 
TABLE B 


-1 
Computation of Ay ai, vhere Àj = at rot +A 


Forward Solution 


1A 2A ЗА А 5А 6a 18 98 38 D 5B 65 Check Е 
JA 14,27} 1.362 1.983 3.639 9.870 3.653 41.69% 41.894 
2A 5.780 .025 3.068 2.426 2.154 2.355 17.160 27.160 
ЗА 14.137 2.532 1.797 -2.025 3.572 22.021 22,021 
bA 15.006 6.896 5.707 3.55 39.833 39.833 
5А 12.969 8.808 3.583 55.913 05.913 
6A 17.881 3-779 43.013 43.013 
38.251 1.815 18.449 36.288 42.370 39.234 К 2.345 3.572 3.5h5 3.583 3.779 209.83} 
+0701 1 14,274 1.362 1.983 3.639 9.870 7.109 | 3.653 h1.895 42.894 
«Ίο 2 5.651 -.163 2.722 1.088 1.479 | -.347 2.345 13180 13.175 
"0722 3 13.857 2.105 468 -2.971 | -.518 «068 3.572 16.580 16.581 
D 11.886 3.590 3.633 | -.686 -1.181 -.543 3.555 20.279 20.284 
"153 5 4.645 2.103 |-2.212 -.278 003 -1.071 3.543 6.763 6.777 
+0889 6 11.253 | -.627  -.127 «911 -.600 -1.605 3.779 12.969 12.98% 


TABLE 9 


Computation of Αγ πὲ Continued 


-1.000 -.095  -.139  -.255  -.692 -.Ь98 
-1.000 1029 -.082  -.263 -,262 
-1.000 -.152 -.03b  .21à 


сул Fw Rj 
] 
p 
е 
š 
8 
Ñ 
: 
3 
ES 


-2.935 
-2.331 


-1.197 

6 -.298 -1.707 
m --163 -1,459 
1053 4183 -.336 -1.15 


918 PSYCHOMETRIKA 


14. Each column of the matrix obtained in the backward solution now 
is multiplied by the corresponding element of row g. For the first element in 
eolumn 1, we have .576 X 3.6526 = 2.104. The resulting matrix, with cor- 
responding off-diagonal elements averaged to make the matrix perfectly 
symmetrical, is shown in Table 11. 


TABLE 11 


Computation of (DÈ Αι Ἢ) БЫ 


1 2 3 4 5 6 Е 

2.104 .057 -.172 +339 -1.599 7.212 .517 
-057 1.100 «056 -.270  -.192 ορ .709 

-.172 «056 1.022 -,220 -.098 «306 .89l 


=1:599 -.192 -.098 -1732 2.934 — -.5h0 -.227 
š 1.270 1581 
-517 09 „890 253 20007 — .581 2.727 


TABLE 12 
Computation of Lg = [oè ax Dè Ja, ] q 


= .-. O 
1 


2 3 4 5 6 

1 051  -.053 073 «051 ~.064 „ооо 
2 -.082 “036 «018 «018 ош, .022 
3 *003  -.005 -.003 -.009 o2} — -.0l9 
k, 131  -.003 -.076 -.062 O54 052 
5 O70 — .093  -.108 -.009 i229 101 
6 =.175  -.053 — .100 -,058 080  -.055 
7 .001 -.041  -.008 -.092 120  -.052 
8 -.065  .10&  -.085 086 --ο26 003 
9 076 -.069 075 «005 006 010 
10 -.005  -.022 012  .073 -.0l}  ..028 
E 005 -.003 -.002  .003 ..005 ‚оо 
Ck 005  -.002  -.00h „оо2 -\оо% 


15. Next compute, by successive columns, the matrix Li , which is 
shown in the first ten rows of Table 12. The ¿th element in the firs 
of L; is the product sum of elements in the first row of Table 11 by the cor- 
responding elements in the ¿th row of the a; matrix in Table 3. For example, 
the first element in the first column of L¿ is (2.104) (.023) + (.057) (—.047) + 
(—.172)(.089) + (.339)(.033) + (—1.599)(—.004) + (—.212)(—.016) = 
«051. The second column of Lj is obtained in the same manner as the first 
except that the second row of Table 11 is used instead of the first. Το obtain 
the third column of Z; the third row of Table 11 is used, and so on until the 
table is completed. 

16. Step 8 now is repeated, rows a through f, using the L/ 
a third approximation to the altered test lengths (i.e., 


t column 


matrix to obtain 
а new row f). These 


-  — ν πι 
ο -.. 
ων: αυ 


EE S — 
—— Tu 
LLL 


PAUL HORST AND CHARLOTTE MACEWAN 319 


computations are not reproduced here, but the values obtained in row f may 
be seen in the third row of Table 13. 
TABLE 13 


Differential Prediction: Successive Approximations” to l'Dy, for m ἒτο = Т1 


1 2 3 4 5 ; [7 Σ Value of ¢ for 
Successive 
Approx'n Values of L 
(0.5)1^5,: 1 12.50 4.50 15.00 11.50 7.50 20.00 71.00 L .227 
2 13.34 5.50 12.76 12.57 12.55 15.28 71.00 La +234 
3 13.20 5.31 11.57 12.56 16.06 12.32 71.02 13 .236 
4 13.20 5.18 11.0% 12.51 17.63 11.44 71.00 Ly .237 
5 13.26 5.12 10.82 12.56 18.15 11.09 71.00 15 .235 
6 13.31 5.12 10.75 12.55 18.37 10.91 721.01 Lc .238 


---------------------------------------- ο ος ον 


*The third and subsequent approximations were computed by the procedure described 
for the general case. 


TABLE 14 


Differential Prediction: Successive Approximations to 10, for T, = tn = Т1 


From (b, p. 60) 


Σ Value of φ for 


a = 3 + 2 6 Successive 
Approx'n Values of L 
12.50 4.50 15.00 11.50 7.50 20.00 71.00 L) .227 


m 


(0.5)1*5,: 
2 13.34 5.50 12.76 12.57 12.55 14.28 71.00 L5 .23h 


12.52 16.05 12.29 71.00 1з «235 


13.27 5.32 11.55 
«936 


13.23 5.20 10.98 12.47 17.62 11.51 71.01 Ly 
13.31 5.15 10.76 12.46 18.13 11.19 71.00 Ls +237 
19.16 18.37 11.00 71.00 Lg «236 


6 13.35 5.12 10.10 


the fourth approximation may be summarized as 
puted to obtain the square roots of the cor- 
responding values in the new row f; (2) steps 9 through 11 are repeated bis 
the new values obtained in row g to compute the matrix 45; (3) ος 
throueh 15 are repeated to compute Li ; (4) step 16 is repeated n Ὃ 2 
fourth approximation. Thus, given any approximation, row 3 а ер # үт 
steps 9 through 16 designate the procedure which may be use | Ὁ р s 
generality to compute subsequent approximations to optimal test leng 


for differential prediction. 
In all, five approxima 


м τω 


Computations for the 
follows: (1) a new row g 15 com 


tions beyond the first. were computed and are 


920 PSYCHOMETRIKA 


summarized in Table 13. Of these, the second was computed by the procedure 
presented in [4]; approximations three through six were caleulated by the 
procedure described for the general case. . | 

17. Successive indices of differential prediction efficiency. 
as follows: 

(a) To obtain ¢, corresponding to the first approximation to the altered 
test lengths, each element in the L/ matrix is multiplied by the corresponding 
element in the o/ matrix in Table 3, and all products are summed. The 
resulting value, .227, is found as the first entry in the ó column at the extreme 
right in Table 13. 

(b) The value of ¢» 
matrix is used. 

(c) Subsequent values, ¢; , are obtained b 
the corresponding elements of a! in Table 8. 

Table 14 shows the results obtained in 
the same data and with the over-all new 
the original time. Comparison of the corn 

14 indicates results essentially the same fo 
discrepancy does not exceed one-tenth of a minute, and the increases in ¢, 
though small, are comparable within the range of rounding errors. In neither 


case have computations been carried to the point of complete stabilization of 
the vector of time allotments. Results, however, appear adequate for practical 
purposes. 


; $, are computed 


is obtained in the same manner except that the LZ 
y using the elements of L/ and 


[4] by the original procedure, for 
testing time also equal to one-half 
esponding values in Tables 13 and 
r all practical purposes. The largest 


The question may arise as to the st; 
sample to sample. The entire problem of 
yet been touched. 


ability of the time estimates from 
significance tests, however, has not 


The General Case for Multiple Absolute Prediction 

The computational procedure 
length for multiple absolute same sequence of 
operations as that given in [4], the difference being that in [5] the matrix of 
validity coefficients is used, whereas in [4] these coefficients in deviation 
form for each test were required, 

Similarly, the Sequence of operations for the general ¢ 
absolute prediction is the same as that Presented above, the 
that the matrix z, is used, whereas а. Was required above. Instead of pre- 
senting a numerical example in detail for the general case for multiple absolute 
prediction, here only the procedural steps which differ from those described 
above will be indicated. Namely: 

Step 1 is omitted. 

In step designated 6-7, the r, matrix is used іпѕёе, 

In steps 15 and 17, the r4 matrix is used instead of 
distinctions assume, as was assumed for the genera 


presented in [5] for obt; 


pre üning optimal test 
prediction consists of the 


ase for multiple 
difference being 


ad of matrix а, . 
matrix a! . The above 
l case for differential 


PAUL HORST AND CHARLOTTE MACEWAN 321 


predietion, that the second approximation to altered time allotments was 
computed by the original method. 

The series of approximations to optimal test lengths for maximum 
absolute prediction, shown in Table 15, was obtained with the same original 
data as the series in the previous example, but with the over-all testing time 
taken as unchanged. Hence the original test lengths were taken as the first 
approximation. 

To demonstrate that the procedure developed for the general case yields 
the same results as the procedure presented in [5], the procedure described for 
for general case was employed immediately. The square roots of the original 
test lengths were found at once, as designated by step 8, row g of the preceding 


TABLE 15° 


Absolute Prediction: Successive Approximations* to Y Dy, for T) = To = 2 


1 2 3 E 5 6 £ Value of A for 
Successive 
Values of L 


L 2.203 


Approx'n 
(1.0) D,: 1 25.00 9.00 30.00 23.00 15.00 40.00 142.00 


2 32.53 10.02 10.50 21.40 18.65 18.89 141.99 Lo 2.229 
2.230 


3 32.82 9.67 8.30 20.31 21.60 49.30 142.00 13 
MM 32.79 9.53 7.60 19.96 23.03 49.05 142.00 L, 2.230 


ee 


"της second, third and fourth approximations were computed by the procedure 
described for the general case. 


TABLE 16 


Absolute Prediction: Successive Approximations to 1’D,, for T) = To = 142 


From (5, р. 120) 

=== ы ee, u mnc UE ος 
== ue of for 
£ Succcssive 
i £ 3 К 5 $ values of L 

Approx'n 
142.00 L, 2.203 
142.00 L, 2.230 


102.0 L; 2.234 


(1.0)1 Dg: 1 25.00 9.00 30.00 23.00 15.00 40,00 


2 32.54 10.00 10.45 21.42 18.66 48.93 


3 32.87 9.70 8.21 20.21 21.61 49,40 
142.00 1L, 2.232 


һ 32,76 9.52 1.57 19.99 23.08 49.08 


rections given in the steps 


; ved the di : 
и омей course, that in steps 15 


example. Further procedural step: rur d 
ion, 


Subsequent to step 8, row g, with the exception, O 
and 17, the rf matrix was used instead of m «par vd computed. These, 

Three approximations beyond the original yn du le absolute prediction 
With the corresponding values of the index of multiple ? 


922 PSYCHOMETRIKA 


efficiency, À, are shown in Table 15. The corresponding results obtained by 
the original method are found in Table 16. A comparison of the two tables 
shows no discrepancy in the entire series greater than one-tenth of a minute, 
and no difference between the corresponding values of А beyond those of 
rounding errors. 


Mathematical Derivation 


The General Case for Differential Prediction 


'The mathematical rationale presented in [4] provides a solution for 
obtaining optimal test lengths by means of a series of approximations. The 
formulas derived are not applicable, however, in the event that the altered 
time allotment for any test approaches zero. The derivation which follows 
consists in developing, from the computational equations presented in [4], 
formulas which do not involve reciprocals of the altered time allotments, 
and which, consequently, provide a solution such that its applicability is 
perfectly general. Using the notation of [4], let 


the number of predictors, 

the number of criteria, 

the (n X n) matrix of intercorrelations of tests of original lengths, 

Te the (n X N) matrix of validity coefficients for the tests of original 
lengths, 

the (n X n) diagonal matrix of original test lengths, 

the (n X n) diagonal matrix of altered test lengths, 


the (n X n) diagonal matrix of reliability coefficients for the tests 
of original lengths. 


As in [4], define 


Won H H 


° 


555 
H H Il 


r“ 


II 


R Та (I = Юу» 


rr - 1) 
41-5), 


A = DAT — D,,), 


II 


а, 


апа again state е eonstraining condition, T = 
unities. 

Start with equations (43) and (44) of [4], n 
which the formulas for the iterative solution for 
respectively, 


1'1),1, where 1 is а vector of 


amely, the equations from 
D, were derived. These are, 


_ (дует 


(1) D, (DAFT , 


PAUL HORST AND CHARLOTTE MACEWAN 323 


and 
(2) L = (В + Ару), , 


where D,,. is a diagonal matrix whose non-zero elements are the diagonal 
elements of LL’. Equation (2) may also be expressed as 


(8) L= (Dy DRDD + ЮАШ СУ ш, 

ог equivalently, аз 

@) L = ΗΝ ΕΝ + AD” a, 

or finally, as 

(5) L = Рур + A) Di^a, , 

ап equation which involves no negative powers of D, . 
Let 

(6) L, = 8 (1 Βλ). + A)" Di, , 

Where 

9 D, = 25 D., 

and 


(D. A)T 
e Dos = (Du AAT 


The first approximation to D, is indicated by (7). The second and all subse- 
Quent approximations to D, may be obtained by an iterative procedure based 
9n (6) and (8). In this manner, successive approximations to L, and Dya, 
may be computed until D, stabilizes satisfactorily. 


The General Case for Multiple Absolute Prediction 


By an analogous development, it can be shown that for the a 
for multiple absolute prediction the formula for successive approximations 
Ὁ Lis 
1⁄2, 


(9 L, = риҷҳризр! + A) Dir. , 
and that the formulas for obtaining the first and subsequent approximations 
9 D, are identical with (7) and (8) above. 


REFERENCES | 
çimi ultiple correlation. 
ш Horst, P, Determination of optimal test length to maximize the multip! 
E | ip i ial prediction battery. 
[2] t ea for the developmen и a differential pr 
Psychol, Monogr. 1954, 68, No. 9 (Whole No. 380). 


324 PSYCHOMETRIKA 


[3] Horst, P. À technique for the development of a multiple absolute prediction battery. 
Psychol. Monogr. 1955, 69, No. 5 (Whole No. 390). 

[4] Horst, P. Optimal test length for maximum differential prediction. Psychometrika, 
1956, 21, 51-66. 

[5] Horst, P. and MacEwan, Charlotte. Optimal test length for maximum absolute 
prediction. Psychometrika, 1956, 21, 111-124. 

[6] Horst, P. Servant of the human sciences. Unpublished manuscript. Division of Coun- 
seling and Testing Services, Univ. of Washington, May 1953. 


Manuscript received 8/18/57 


PSYCHOMETRIKA—VOL. 22, NO. 4 
DECEMBER, 1957 


STIMULUS AND RESPONSE GENERALIZATION: A STOCHASTIC 
MODEL RELATING GENERALIZATION TO DISTANCE IN 
PSYCHOLOGICAL SPACE* 


Косев N. SHEPARD 


AVAL RESEARCH LABORATORYT 


_A mathematical model is developed in an attempt to relate errors in 
multiple stimulus-response situations to psychological inter-stimulus and 
inter response distances. The fundamental assumptions are (a) that the 
stimulus and response confusions go on independently of each other, (b) that 
the probability of a stimulus confusion is an exponential decay function of 
the psychologieal distance between the stimuli, and (c) that the probability 
of а response confusion is an exponential decay function of the ο τοι 
distance between the responses. The problem of the operational definition of 
psychological distance is considered in some detail. 


Stochastie models for learning have been developed by Estes [8], by 
Bush and Mosteller [6], and others. With the exception of a few investigations 
confined to the stimulus side of the learning process, such as that by Bush 
and Mosteller [5], however, these models have not been extensively applied 
to generalization phenomena. This paper, in using the notion of psychological 

?stance, approaches the generalization problem from a somewhat different 
direction, 


Consideration will be restrieted to situations in which a number of 
umber of stimuli by consistent 


responses are discriminatively attached to a n i 
Application of differential reinforcement. More precisely, the learning process 
Will be supposed to conform to the following rules: (a) On any given trial а 
Single stimulus is presented at random from а set of N stimuli. (b) On eac 
trial the subject is constrained to a fixed set of N p ce g T or any 
given subject, there is a prevailing one-to-one assignment s h e ee 
to the N stimuli arbitrarily determined in advance such t. 2 nh κ 
forcing operation (e.g., the word “eorrect”) is applied if an A κ е 
sentation of a stimulus is followed by the occurrence of its ee -- ο 
The present model, however, is concerned not with the learning p 


izati chibi any one given 
Per se but with the pattern of generalizations exhibited at y gi 
of a Ph.D. dissertation κ 
ification carried ou 
τ ος Graduate School of Yale University 54 cil Postdoctoral Associateshij 
E à National Academy of Science: i larly indebted to m Ta 
А D i i ort. 
orana A s RE Rosner for their generous o me e ή Саву 
J йез have also been contributed by Drs. G. 
: М. Holland, and Н. Glaser. Ξ ΜΕΝ 
TNow a Psychological Laboratories, Harvard University 
325 


926 PSYCHOMETRIKA 


Stage of learning. Furthermore, as an 
stimuli or responses, all subjects are 
same pattern. 

Ordinarily there is no necessary or natur. 
the stimuli and responses and, indeed, different assignments may be set up 
for different subjects. It is convenient, therefore, to have a way of referring 
to the response which has been assigned, for a given subject m, to a given 
stimulus, S; , without having to ask just which one of the N responses that 
may be. Accordingly, the following definitions are introduced: 

S; the ¿th of the N stimuli S, , S, , ... , Si 5 
the ¿th of the N responses Bı, Ra, ... , Ry i 
Rci,n = the response assigned to S, for subject m; 
Sco.m = the stimulus to Which 72, is assigned for subject m. 
Thus the set of all stim 


approximation for any given set of 
assumed to generalize according to the 


al correspondence between 


1605, for any subject m, divides 
reinforced sequences of the form S; — Γέρων, and (b) 
ced sequences of the form S; > Ro, with û = k. 
Will be, for every stimulus and 
Will be followed by the other. 
Pas = РІВ, | Sin = the conditional 
for subject m. 
()0),» are defined in an analogous manner, Thus 
Pit = Pay. | δ], = the conditional probability of R 
given S; , for subject m. 
The responses are partitioned so as to be mu 
tive. The conditional probabilities, therefore, sa 


probability of Г, , given S; : 
Виша), Роль, and P. 


G).m әу 


tually exclusive and exhaus- 
tisfy the requirements 
(1) 2 Pas. = 1, Pio s > 0. 


If the probabilities Р, сау increase with continued a 


reinforeing operation, there must result a decrease in Some Р, with 
û = k. Although it is known that the Probabilities of the various incorrect 
Tesponses, called generalization errors, do not in general decay at the same 
rate, little advance has been made towards the quantitative understanding 
of this aspect of the learning process, About all one has to go on is the 
qualitative observation that, at a given Stage of learning, the probability, 


Рф,» , decreases both with the dissimilarity of S; and S, , and with the 
dissimilarity of Ж, апа Ta 


pplication of the 


dims 
The Reduction of the S-R Process to an 


8-8 and R-R Process 
An error in which the response assigned t 


) 9 а stimulus, S; , follows the 
presentation of another, 8, , (as in B of Fig. 1) may be viewed as comprising: 


< > h —— Еа o 
h m سس‎ SS —— —— 


ROGER N. SHEPARD T 
S R S R к - 
9 o 


0000000 
i 
= 
= 

ooo 

РА 

— 

— 

o ο 
VÀ 
^ 

/ 

ο ο 

Сыра 

we 


- Š s R S R 
: o ο 
о η ο 
ο o (i) 190—0 (1) مةه‎ (D 
. ο Ἂς ο ο NE Ë ° b 
l. (j) jo ^ o(j) ο " 
ο ο ο k 
4 6 б E à RP 
D ° qup © ° 
Е p? 
FIGURE 1 


Differen 

t ways of d izi $. 

sequence, of conceptualizing an S-R sequence as generated by û 

or their Ed сеа on the left stand for the different. simui sf ae id ааа 
'esponses. In А, for example, S; was presented and followed by its S 


response, Rg Wevi w W si 
; Roi), s. In B, however, S; was followed by an incorrect response, Raym 
Se, dt G),m« 


two ev š " 
in клн pos ` in E It may be that the subject confused two 
to be the stimulus па ‚ it was taken to be δε . Now, if S, is taken 
however, an ὃν, TUNE which should ensue is uj, . Suppose, 
Was to make R : : 8 so eonfused two responses; whereas the tendency 
criteria) was R сая IQ TSP ONSE: actually made (according to the external 
8-8 transitior the . In this way an S-R transition may be analyzed into an 
RÊ E and an R-R transition. Alternatively, there may be a stimulus 
Shy Î ithout any response confusion (D), a response confusion without 
50 count us confusion (E), or both stimulus and response confusions which 
oe eract each other that the correct response is made (F). 
is analysis suggests the following additional definitions: 
probability that Si; 
en to be S, - 
bability that Е, 


P$ = P[S | 8,] = the conditional when рге- 
sented, will be tak 
the conditional pro 
in place of R; . 


will be made 


lI 


Pî, = PIR, | Ri 


328 PSYCHOMETRIKA 


The term introduced in the second definition is 
when the stimulus is taken to be 5 
up so that 


also the probability that, 
i, E; will follow, since the model is set 


Р s = ) if ¿= k, 

0, otherwise. 

Thus, the conditional probabilities are treated 

the connection between any stimulus and its as 

makes errors owing to a certain inability to ident: 

the response with sufficient accuracy (sce Fig. 1). The analysis, then, is 

applied to the confusions among the stimuli and among the responses but 
not to the associations between the stimuli and the responses, 

Strictly speaking, the S-S and R-R transition probabilities pertain to 

a short interval of time during the learning process so that a stable state 


may be assumed to exist. Over extended periods the probabilities will change 
owing to the effects of reinforcement. At 


as if the subject (a) knows 
signed response but (b) still 
ify the stimulus or reproduce 


pelle ses a any given time, however, they 
must satis: y e conditions 

(2) УР, =1, Pf >0, 

(8) Σ ΡΕ = PE e Q, 


The fundamental assumption which will be made here is that, at each 
stage of learning, the response confusions are independent of the stimulus 
confusions, or, more explicitly, that Psy tem does not depend upon which 
stimulus was presented (and taken to be S;) on the trial considered. This 
assumption implies that 


(4) Pigs = Ур? зр? 
k 


(AVG) sm + 


Since the indices i, j and £ are allowed to r 


: ange over the values 
1,2, ---,N, (4) corresponds to the defining equation for matrix multiplieation 
so that 
(5) Р; = Pase Puna. a 
where, for example, 

Pisa Pio, аы. Pim 

(6) i = Pun Рао Psion m 

Рау, Puce) sm Pians 


ROGER N. SHEPARD 329 


(Dr. Burton S. Rosner has independently proposed essentially the same 
treatment for the S-R process.) 

As the form of the definitions suggests, there is an S-R symmetry in 
the notation such that the content of (4) can be set down in an alternative 
form giving the probability, when S, is presented to subject т, that R; 
will ensue. 


(7) Pays = Σ Pisas. Pir . 


The matrix representation becomes 
(8) ΟΣ x Poco) m Prr . 

Equivalent equations (5) and (8) can be reduced to a single form without 
parentheses around the subscripts. In order to do this, it is convenient to 
introduce permutation matrices, Jm , with elements J;,,,, , such that 
(9) ig es 1, if RE, is assigned to S; for subject m, 

0, otherwise. 

By carrying out the indicated multiplications, the following identities may 
be verified. 

Pis.m = Poesien’ Jh , 

Р.в, = TRA , 
(10) Ps. = ЈР E 

Pans = Ji Ῥ εν} 

JaJa = JaJa = I. 
Here, the subscript « stands for any of the symbols δ, (S), E, or (R); J; 


is the transpose of J,, ; and I is the identity matrix. . 
Using these relations, (5) and (8) can be brought into the form 


(11) Рзв,һ = Pss:J, Pen , 


Where P,.,, is the matrix of S-R transition probabilities Pinos M 
Now, although ],, is known and Psp, can be estimated from the experi- 


mental data, neither Pss nor Par can be directly determined. Tt Wis E 
thought, however, that by assigning the responses to the stimuli in a differen 
he response confusions could be 


way for each subject, the influence of t С 
edere: ia of (11) so that Pss could be solved for in terms of ha 
Suppose, then, that there are M subjects, each with a different ae Í 
So that, over the set of all M assignments, every pair of а 15 as πρ 
to each pair of stimuli the same number of times in both of the two pos 


orders. 


330 PSYCHOMETRIKA 


Using the simplified notations 
1 
(12) Ps = M SO И 


(13) Poo) = q > Poo › 

and assuming that all the subjects are essentially alike so that Pss is inde- 

pendent of m, equation (5) may be averaged over all ЛГ subjects to yield 
Ps. = Pss-P any m . 

Postmultiplying through by P. 

(14) 


-1 
(R)(R) 3 
oz -1 
Pss w Psm Poo Ы 


The assumption that all subjects tend to confuse stimuli in accordance 
with the same pattern is analogous to the similar assumption made in order 
to pool data from different subjects in psychological scaling procedures. 
This assumption is probably correct only as a first approximation since the 


tendency to confuse any particular pair of stimuli probably depends to some 
degree upon the histor 


y of discrimination learning associated with that pair. 
In order to evaluate the inverse R-R matrix, it may be noted from (10) 
that 


(15) Ῥω. = Jn*Prr-Jin D 

where the assignments are so chosen that the matrices J,, and J, select, for 
each nondiagonal cell of Ῥω; elements PZ (¿ z k) from every non- 
diagonal cell of Ppr an equal number of times, as ranges from 1 to M. 


Likewise, for each diagonal cell of Pirn , elements PA will be equally 
selected from each dia 


gonal cell of Ppp . Thus, by definition of the matrix 
elements Jx, and J^, ,, , averaging over all assignments insures that 

1 1 

M È Piom. = 7 > > > e a ERI A 


(ik) (95h) 


1 
NQ Ὁ 2427 PÀ = PF, 
(16) ΚΡ 
1 1 
M Σ Ρίνω.» = М > > ос Posee 
1. 
πο = Q, 
where P” is the mean probability that two different responses 


pi are confused 
and Q^ is the mean probability that a Tesponse is not confused with any 
other. Since the total probability must be conserved, 


ROGER N. SHEPARD 331 


Q7) Q* + (N — 0-Р" = 1. 

One way of understanding the operations represented in (16) is to mul- 
tiply, e.g., an arbitrary 3 X 3 matrix (for Prr) by all six possible 3 X 3 
permutation matrices and their transposes as indicated in (15). The sum of 
these products will contain equal nondiagonal elements, P", and equal 
diagonal elements, Q^, as required. A minimum set of permutation matrices 
having the necessary properties for № = 9 is given in the appendix. 

Equation (14) may thus be written in the form 


Q^ pF aT p=] 

R R MM ρα 

(18) Pss = Psw’ А 9 : 
R ΡΕ ы Q" | 


But the inverse matrix has a simple representation such that 


αρ Е MER 
ji —p? α = Р") s ND 
(19) Pss = karla me =|: £ 3 . , 
—p* ра а Е 25) 


t of the original 


as m: 1 » using (17) and showing that the produc 
as may be verified by using (17) a wid s 


matrix (with elements P" and Q^) and its inverse represen 
yields the identity matrix. | e E. 

Expanding with respect to the general term in (19), the probability 
that S; will be taken to be S, is 


1 = R R 3 
ht Piw — X БЕ Σ Pan 
P q" p" Ы ds d "m 
Using (1) and (17), this may be reduced to 
ἘΣ. Piuy = Be 
(20) a= 7 NP. 


5 : ee b 

Thus, although the response confusions are not d rer z 
employing different S- assignments, they are consolic ate E т. ^ 

s inciple this parame ° 
Ee ا‎ d Pm а 1) Бу extrapolating а fitted 
T i 

S-R transition probability function (for pairs of 
increasing stimulus dissimilarity. For, 


332 PSYCHOMETRIKA 


R 
as Pš — 0, Pip ιβ, 


In practiee, if the responses are highly distinctive so that P" is close to zeto; 
the probabilities P; can be taken as estimates of the probabilities P3 
with less R-R probability contamination than would be possible without the 
counterbalancing technique. The reason for this will appear iù the discussion 
of estimation procedures. 

If the individual R-R transition probabilities are desired, (8) may be 


ploying arguments analogous to those 
developed before, the probability of an R-R transition is found to be 


R P (E = P Е 
(20 йош NP 
This is the inverse of (20) in that P5 
taken over all pairs of stimuli. 

The utility of reducing the S-R transitions to 5-9 and R-R transitions 
can now be ascribed to the consequent increase in predietive power of the 
model. If the S-S probability matrices have been determined (in H experi- 
ments) for each of H sets of N stimuli and if the R-R matrices have been 
determined (in H further experiments) for each of H sets of N responses, 
the total number of experiments for which the S-R probability matrices 
сап be predicted is ΛΙ. 3. For, returning to (11), there are N! distinct 


denotes the mean transition probability 


» Predictions could be made only 
ady carried Out, and the ratio just 


The Characterization of the S-S and R- 


R Processes in Terms 
Inter stimulus and In. 


of Psychological 
ler response Distances 
In the preceding section the S-R 


processes whieh, in turn, were charac 


bility that some simple relation exists bet : 
matrix, and between Pa and РЁ of the R-R matrix. This, in t 
plausible if the probability of confusing two sti 
function of the dissimilarity between them so 


urn, appears 
muli (or responses) is some 
that, say, PS and Р will 


ROGER N. SHEPARD 333 


increase or decrease together as the dissimilarity between S; and 8, is made, 
respectively, smaller or larger. 

Instead of formally introducing the notion of dissimilarity, it is prefer- 
able to define the concept of distance, which has the advantage of a rigorous 
mathematical interpretation. Explicitly, a set of distances, D;, , defined for 
all pairs of elements, S; and S, , is any collection of numbers satisfying, for 
every S; , S; , and S, , the following requirements called metric axioms 
([2], pp. 5-16, [16], pp. 118-119): 


(22) Di = 0, if i-k, 
(23) Dix = Di; , 
(24) Din + Du; > Юн. 


When speaking of the distance between S; and S; , the symbol D$ 
will be used. Similarly the distance between R; and R, will be distinguished 
by the symbol DZ . Any set of elements for which a distance function satisfying 
the metric axioms has been defined is called a metric space. The space may 
be called a physical or a psychological space depending upon whether the 
distances are determined from physical or psychological data. (An example 
of a physical space is the set of sinusoidal tones with 

Da = (fi — Л)? + (αι — αὐ”, 
Where f, is the frequency and αι the amplitude of tone 8; . That this definition 
satisfies axioms (22) and (23) is immediately clear. The satisfaction of (24) 
follows from the inequality of Schwarz. For a review of some psychological 
measures which could presumably be used to construct a psychological 


space for this same set of tones, see Messick [17].) . ΒΕ} 
It will be assumed that there exists some function, f, such that Pj; is 


proportional to f(D). The factor of proportionality must depend upon i 

for, although the average distance of S; from the other stimuli in the SE a 

Situation may be large or small, the probability of transition from ple Й 

summed over all ᾖ must be conserved by equation (2). Thus the relation 

may be set down in the preliminary form 

(25) Pir = αν (DÈ; , Е 
5 η 

where d; is a constant associated with S; , and where D; satisfies the metric 

axioms. Summing over all k, 


РЧЫ = 1 = d > 78). 
Е 


Solving for d; , it is immediately found that 


s LIO) _. 


(26) РА Σ ποῦ 


984 PSYCHOMETRIKA 


At this point some decision must be reached concerning the nature 
of the function f. This, of course, is one way of formulating the problem 
traditionally investigated in studies of stimulus generalization. The inde- 
pendent variable in such studies is some measure of stimulus dissimilarity, 
and the dependent variable is some measure of stimulus confusability (like 
the probability that the response, reinforced to one stimulus, will occur to 
the other). . 

Now, although these studies lend support to the conjecture that fisa 
continuous monotonically decreasing function, attempts to specify it with 
greater precision have not led to any consistent picture ([14], pp. 616-617, 
[26], pp. 577-579). This may be a consequence, at least in part, of the variety 
of independent measures employed. The most frequent measures of dis- 
similarity which have been used are distance on a physical seale and number 

of just noticeable differences, JNDs, Separating two stimuli. However, 
there are theoretical objections to either of these measures. 

That Psychological distance or confusion probability is not an invariant 
function of physical distance is now well known. Some investigators, though, 
have supposed that the summation of JNDs provides the kind of measure 
required ([15], pp. 183-225). Unfortunately, in order to sum JNDs between 
two stimuli, this summation must be carried out along some path between 
these stimuli. But the resulting sum will be invariant and, therefore, possess 
fundamental signifieance only if this path is a least path, that is, yields a 
Shortest distance (n psychological Space) between the two stimuli. One 

‚ in arbitrarily holding certain physieal parameters constant 
(as is ordinarily done in the summation of JNDs), that the summation is 
constrained thereby to a shortest path (or geodesic) in ps 
even though it is, of course, confined to a shortest path 
in physical space. Indeed, given any particular summ 
of ascertaining whether it Was or was not carried out over a least path. 

These considerations lead one to look for some way of estimating the 
psychological distance between two stimuli without depending either upon 

physical scales or upon any arbitrary path of integration. One 
suggested by Gulliksen and Wolfle [11], is to use 
which subjects directly estimate the similarity 0 
[22] and, later, Attneave [1] in their studies of 
paired-associates learning have used techniques of this ki 


ychological space, 
(or straight line) 
ation, there is no way 


these variables was not pursued. 

More recently, a number of multidimensional scaling methods have been 
developed which make possible the determination of a set of interstimulus 
distances solely on the basis of similarity judgments [17]. Thus one might 
now extend the kind of approach proposed by Gulliksen and Wolfle to the 


Б. 


—— - —  À 


—  — —— 
— ——,—cFK — 


ROGER N. SHEPARD 835 


quantitative study of generalization in paired-associates learning. However, 
beyond the fact that these judgmental methods are limited in application to 
mature human subjects, there appears to be no readily available means for 
interpretation of the dependent variables of these scaling procedures within 
the framework of existing behavior theory. Furthermore these methods 
have not, as yet, been extended to the response domain. 

To maintain the integrity of the present approach, it seems desirable to 
avoid the use of methods which essentially fall outside the scope of the be- 
havior model to be constructed. Instead of starting with an arbitrary measure 
of psychological distance in order to discover the relation between this measure 
and consequent stimulus confusion probabilities (the traditional approach 
to the generalization problem), the present strategy will be to begin with the 
confusion probabilities themselves and then, proceeding in the reverse 
direction, to discover a function, f, which will transform these probabilities 
into measures satisfying the metric axioms. 

Actually, there are many functions having the necessary properties. 
This is because the so-called triangle axiom given in inequality (24) is not 
particularly stringent. However, that requirement can be usefully strength- 
ened by making the reasonable assumption that physical space can be mapped 
into psychological space by a transformation which is not only continuous 
but also has continuous first partial derivatives. Such a transformation carries 
any straight line in physical space into a smooth (differentiable) curve in 
psychological space. 

The importance of this assumption derives from the fact that a segment 
of a differentiable curve approximates more and more closely a segment of a 
straight line as the two segments are made shorter and shorter. In the limit, 
for three stimuli, S; , S; , and S; , such that 8, is between S; and S; on a 
single physical dimension, axiom (24) should go over into 


(27) + DB = Di, 


provided that S; and S; are sufficiently close together in physical space. 
It is possible to demonstrate (by introducing further assumptions) m 
an exponential decay form can be deduced for the function f so that (2 ) 
will be satisfied for any three properly chosen stimuli. In order to dy 
tain the continuity of the present argument, however, it will be e. E 
primitive that f is an exponential decay function. The jos ». 5 
choice will have to depend upon the empirical results which a n: ee 
its use. It might be noted, however, that, apparently largely A t ү tm 
Hovland's results [13], Hull postulated an exponential beu. p "ud 
pp. 183-225). Other generalization studies have also obtained data icd 
consistent with this assumption [3, 9, 10, 12, 19, 23]. In any e 3 "d 
ponential function is perhaps the simplest function with the desirable beha 


336 PSYCHOMETRIKA 


that, as its argument ranges from zero over all positive bounds, it subsides 
+ . 

asymptotically from a finite value towards zero. 
Substituting an exponential decay function for f, (26) becomes 


s _ exp(— Dj) N 
paj “Σοφ Dd 


Since psychological distance is symmetric in accordance with (23) and 
since the psychological distance between any stimulus and itself must be 
zero by (22), the entire matrix of transition probabilities may now be recon- 
structed on the basis of just N (N — 1)/2 distances. "Thus only half of the 
degrees of freedom in the probability matrix may really be free in the theo- 
retical sense. 

The analogue of (28) is, for the response process, 


хр (— DE 
(29) Pî = A C ра) | 
> exp (— Di, 


+ . 


nvestigated in only α 
; 20]. The particular choice is made on the basis 
of the same arguments assembled in support of the selection of that function 
in relation to the stimulus process. In addition, the symmetry of the model 
suggests that the same function may apply in both cases. 


The Introduction of the Stimulus and Response Weights 


After affecting the reductions of the last section, it must be acknowledged 
that they were based on the questionable assumption that t 
distance from 8, to S, is always identical to the psychologieal distance from 
S, to S; . It has, for instance, long been known th 
to be mistaken for familiar stimuli. Thus, under 
may be seen as downright, whereas { 
One might therefore suppose that t 
is considerably less than the distance in tl 


the impetus for introducing the distance notion in the first place, that is, 
the possibility of completely characterizing the S-S probability matrix in 
terms of a substantially reduced number of quantities, 

However, it may be that apparent violations of distance symmetry 
can always be traced to some factor, like. familiarity, which pertains to in- 
dividual (rather than to pairs of) stimuli and which has the consequence 


` Z. 7 wi; —— iw 
κα, — x=. . J... L. J JU U. 


ROGER N. SHEPARD 837 


that the presentation of one stimulus leads to the perception of another 
more frequently than would be expected knowing the probability of perceptual 
distortion in the opposite direction. With each 5, , then, there may be asso- 
ciated a weight, TV , such that, if S; is presented, the probability of per- 
ceiving S, is proportional to W$ . Equation (28) will then assume the modified 
form 
s s 
во ч емш. 
> W, exp (— Di) 

The introduction of the term W provides for both the redundancy in 
the probability matrix and, presumably, any asymmetry among the stimulus 
similarities. In particular it is assumed that the entire S-S transition prob- 
ability matrix can now be reconstructed on the basis of N(N — 1)/2 inde- 
pendent distances together with N weights. The ratio of the number of 
independent quantities reconstructed to the number used in the reconstruction, 
therefore, can be shown to be 2(N — 1)/(N + 1). 

Postulating that the response process also involves, for every response 
Ti, , a weight or response preference WF , (29) becomes 


(31) в a Из exp (— Di). 
* J, WF exp (= Dü) 
h 


In order to estimate the stimulus weights from the S-S probabilities, 
(30) may be taken as a starting point. The probability that S; will be cor- 
rectly perceived is 

W?exp(— Di) . 


32 P$ = 
: > Wi exp (— Dš) 


Since the weights occur in both the numerator and denominator, they can 
be viewed as containing some arbitrary multiplicative factor which may be 


chosen, for convenience, so that 
1 T 
AT = 1. 
(83) N Σ W; 
Now from (22) it follows that exp(—D,{) = 1. Whence 


Ph Wi E 
(84) 55 = Р (= Ра). 


To eliminate the distance term one has simply to form the product 


pay (Pays _ Eš. 
CD CS τη 


(35) Pë 


338 | PSYCHOMETRIKA 


If this equation is summed over all z, the weights may be obtained except for 
8 factor, >, Q/W3), which does not depend upon Ё and, so, is determined 
by (33)... ` | . | 

By substitution of (20) into the summed equation, the weights may be 
expressed in terms of the overt S-R transition probabilities through the N 
equations, - ᾿ 


Piw Р.) (eo -- Τῇ 
ү? куе» Ῥεω — PF я 
-Wi > — 


: P Y x (бия = D (Bass E Б, 


К^ Pu <Р" а Pu — Ae 


(36) 


Following a similar derivation, the response weights can be shown to 
be given by the N equations * 


"T 


х У бе = P^ (Pas Py" `< A 

(37) WE = τ Pays — E. : Pay, = Pš : P „> 
[Εν ο ) (Eo: m E Ñ 
πετ Pu = Р" á Do p * s 


The Resolution of Psychological Distance into Orthogonal Co 
logical Space ~ 

In this section procedures will be set for 

Dà and D . In addition, a further reduction 

5ο that the transition probability m 
of still fewer quantities. ΄ 


If, in (35), the same terms are used but both exponents 
positive, the weights may be eliminated to yield 


PA 1⁄2 DS 1/2 ' ; 
@9 (20 (i) = ox nt. | 
In terms of the observed S-R transition probabilities, 
given by the N* equations 


ordinates in Psycho- 
. * t " 


th for estimating the distances 
of the distances will be proposed 
atrices can be reconstructed on the basis 


are taken as 


; then, the distances are 


E Pim = PF a Py a — PF 
ш f Py Р", πω — PR)? 
as may be seen by a substitution from: (20). > 


Of the N° distances given by (39), N(N — 1)/2 have been supposed 
to vary independently. However, suppose the N stimulus points can be 
imbedded in a Euclidean space of £ dimensions, In this case the distances 
can be reconstructed on the basis of just NK Cartesian coordinates, X3, , 
(a = 1, 2, ‘°° К) and the generalized Pythagorean theorem 


(40) 5 =a (> С, тз Жай |. 


` m — cara a nra =a YI 
pe o —— M A тт I. 
— — — + =... J. UU .- τς 


~ 


ROGER N. SHEPARD 339 


Thus an economy of description is possible in cases with K < (N — 1)/2. 

Torgerson has presented a general method for determining a set of 
orthogonal coordinates, given a set of distances [25]. This procedure, as 
modified by Messick and Abelson [18] is as follows: Starting with the N X N 
symmetric matrix, Dss , of interstimulus distances, an N X N matrix, 
Bss , of scalar products of vectors from the centroid of the system of stimulus 
points to all pairs of points, S; and S, , is computed from 


gel sis, δ m. Mom Wo Get 
(41) Bj, = 2N > (Dij) + зү > (Dix) K z (Da = aN? Σ > (р). 


Young апа Householder [27] һауе shown that, if this matrix is positive 
semidefinite (that is, if the points can be imbedded in a _K-dimensional 
Euclidean space), it may be factored so that 


(42) ‚Вв = Xis Χις + XKas‘Kés + +++ + Xks:Xks , 


where X, s represents a № X 1 matrix (or column vector) giving the coor- 
dinates X,, for all stimuli S; on the one dimension о, and where X/, is the 
1 X N transpose of that matrix. и 
' Now there are infinitely many possible factor decompositions of the 
form given in (42), each one corresponding to a different orientation of the 
orthogonal eoordinate system in K-space. Which one of these is selected, 
however, ean be a matter of arbitrary stipulation, since they all yield the 
same matrix Bss and since the interstimulus distances, as given in (40), 
are invariant under orthogonal transformations of the coordinate system. 
In practice the individual dimensions (or factors) may be extracted in such 
a way that X, accounts for the largest possible variance imong the original 
distances, X,s for the largest possible variance in any direction orthogonal 
to the first dimension, and so on. In this way factoring may be terminated 
when the ability to reconstruct the original transition probability matrix 15 
no longer significantly augmented by extraction of further dimensions. 
Various procedures for factoring Bss in this way are available ([24], рр. 
149-175, 473-510). үз 
Exactly the same operations may be applied to the matrix of R-R tran- 
sition probabilities, Par , to obtain a symmetric matrix, Dre , of du 
response distances. The computation is implemented by the N° analogues o 


(39), namely, 


Pu -- PENA ҮР Aeg Ve 
(43) в = —log (DE = г.) [те —— 0. 
(ai 


к 


bedded in ап L-dimensional 


E ints can be im 
Once again, if the № response yis NL orthogonal coordinates 


Euclidean space, the distances may be reduced to 
ΧΕ (в = 1, 2, ++- , L) such that 


340 PSYCHOMETRIKA 

(44) Dî. = (> (Хк — κ)”. 

The actual computation of the coordinates X will again be carried out by 
factoring a scalar product matrix Ber so that 


(45) Brr = X,x: XI, + Xa Xi + -.. + Х.Х» , 


where X5; is the column vector cont: 
on dimension β. 


By substituting (40) and (44) into (30) and (31), (11) may now be stated 
in the more explicit form 


W exp — ZE, = Хх pe 
Эке © Баре {Σ (Xš, херь | 
(46) h а 


Wiexp- {> (xz, = Xa)” 
J- R Е aij e 
2 t exp — (X GG, — XD" 


where the braeketed expressions represent matrices generated by allowing 
the indices ¿ and Ё to run Írom 1 to N. Thus the complete set of N? S-R 
transition probabilities can be predicted on the basis of 2V weights, (K + L)N 


coordinates, and the permutation matrix, Те corresponding to the particular 
S-R assignment enforced. 


aining the coordinates for all responses 


Problems of Estimation 


First, with respect to the weights, the left-hand member of (35) will be 


r sta al Sense, when PS and Р, are small. A 
technique useful in alleviating this difficulty is the following: After the 


mated in α preliminary way by (36), each 
row, û, of the matrix of quantities Wi/Ws dew 


" П ; given in (35) may be multiplied 
through by the tentative estimate for the Corresponding WS . In the resulting 
matrix, each of the N entries in column k will be an estimate of WS . The final 


estimate may then be taken, for each column, as the median entry in that 
column. Exactly the same technique may be used to refine the response 
weight estimates. 

In the estimation of distances the difficulty stems from the use of the 


logarithmic transformation of (39) and the consequent fact that, if Piw 


ROGER N. SHEPARD 341 


and Р„‹;, are small, a slight variation in these leads to a large variation in DS. 
The admission of such extreme instability in the determination of large inter- 
Stimulus distances will have a disruptive influence on the factor solution 
used in obtaining the stimulus coordinates. 

One might be inclined to redefine the distance between two widely 
Separated stimuli, S, and S, , as the sum of distances over some connected 
path of smaller distances between them. Thus, in Fig. 2, it might be supposed, 
as an approximation, that 


рё, = Dis + Die + ра + 1%. 


FIGURE 2 


Different ways of approximating a large interstimulus distance by a sequence of smaller 
interstimulus. Чаш: Тһе δις represent stimuli in a two-dimensional psychological 
space. 


'The problem here is one of choosing between alternative paths such as 


I and II. However, such a selection should be guided by two general rules. 
First, a relatively direct path should be chosen. 'That is, the sum of аш 
over the path, Σ) D, should be small. Second, a path should be Teide ic 
does not contain any relatively large (and therefore unreliable) E | Pa 
That is, the largest distance in the path, max(D), should be small. Path I, 
then, is objectionable on both of these grounds. 
Combining these rules, every distance In th 
fined as the sum, >, D, of distances over that connec 
product 
(47) тах(р): У) D is minimum. 
unchanged during the re-estimation 
will be subject to substantially less 


e matrix Dss may be rede- 
ted path for which the 


The small distances will generally remain 
procedure. The large distances, however, 


842 PSYCHOMETRIKA 


unsystematic error. The systematic error will presumably be somewhat 
augmented, however. The technique given here is not the Statistically exact 
one which would have to take into account the number of events upon which 
each probability estimate is based. 

Finally the application of the model is impeded insofar as it holds only 
for a given stage of learning, that is, over spans of trials for which the Р, 
remain relatively constant. The weights and distances could be estimated 
with greater reliability if they were based upon probabilities averaged over 
the entire learning session. 

Now it is an implication of the model that the ratio of any two distances 
must be invariant over learning. "Therefore some function, 0, 


of time, Ё, 
exists such that 


(48) Di = g(t)- Dš, D 


where Diu is the psychological distance between S; and S, at some given 
time, t, as defined by (39), and where D Š 


а 15 some fixed distance between these 
stimuli which does not depend on t. 


The fixed distances, D , contain an arbitrary multiplicative constant 
which may be so adjusted that 


1 т 
49 TJ a= 1. 
It then follows that 
Ld 
(50) 0% = Dio at. 


However, not only is this average computationally impractical, but it also 
heavily weights the large estimates for the Di...) › Which are based on small 
numbers of transitions and, so, are extremely unreliable, 
Since the present model pertains to Beneralization at 
rather than to the course of learning over ti 
regarding the function g(t). However, the u 
the P;, (for i = k) decline rapidly at first 
values of W%/N towards their lower asym, 
the average given in (50) may be more reli 


1 T 
(51) Di, = —log f exp —D5 о] ο) 


cover too wide a range. This 

unstable distance estimates, 
to subtract out the essentially 
i+ Values of WS/N before the 
revailing S_R assignment. 


average, although it does discount the large, 
requires the inclusion of a constant, C $, in order 
random transitions which account for the P. 
subject has acquired any knowledge of the p 


| 
| 
| 
| 
| 
| 


ROGER N. SHEPARD 343 


By transposing terms in (51), then, it may be seen that, for purposes of 
estimation, (30) is to be replaced by 
ΕΝ Wilexp (Dî) + σε] 
(52) ik = 5 5 , 
> Wilexp (— Dš) + С°] 


where Pj applies to the entire learning session. Following through the 
derivation for (39), the psychological distances are found to be approximately 


given by 


δώ... S [III E. SY. 
(58) te ое [a + ы) - e 
Likewise, it is assumed that there exists a constant, CF, such that 
«(ΡΕ PIN: 
(54) Dî, = —log fa + a = _ е"). 


In order to use (53), it is necessary to obtain ап estimate for OF This 
may be done by caleulating a set of stimulus coordinates under the assumption 
that C^ = 0. If, then, the quantities 


Ένα) ey m "M 
= tc EG) ik =i, 2, N) 
es i ( Ë “or : 
are plotted as a function of the distances reconstructed from (40), an asymp- 
tote, c, for large distances may be estimated by drawing a smooth curve 
through the data-points. If the responses have been selected so that P = 0 
then C? = c/(1 — ο). Exactly the same method can be used to estimate 


C^ if Р = 0, улы 
The practical advantage of the counterbalancing ων τ 
in connection with (20) and (21) results from the following finding. In t Л 
application of (53) апа (54), if P^ and PŠ are small, they may be assume 
equal to zero. The only appreciable consequence of this procedure apposta 
to be a slight inflation of the estimates, respectively, for C and C - e 
With regard to the estimation of the stimulus and response da 8, 
the constants С° and C drop out in the derivation of (36) and (37). There d 
the weights can be approximately estimated from these equations, as d 
stand, even though the S-F transition probabilities are averaged over the 


entire learning session. 


Appendix 
Since, in all the experimental work to be reported N — 9, it will be useful 


ibi j i i having the property that over all 
woe bem sr таб ss z ened $i bd pair of stimuli the same 


subjects, every pair of responses is assigne 1 f р 
number ‘ef times. In order to do this, it is convenient to introduce 0, , the 


944 PSYCHOMETRIKA 


3 X 3 null matrix; L, , the 3 X 3 identity matrix; H, , given by 
4 010 

Н, =|0 0 1 

100 


and K;', the 3 X 3 matrix with elements K,, = 1,ifg = ¿and = j, and 
K,, = 0, otherwise. If for positive integers r, 


E 


н, O, ο, 
Jo-u = Jua = | O, Н, о, |, 
O, O, H, 
ο, $ ο 
Jior- = Jua = Ο: 0, І, Jaen у 
L O, O, 
L 0, o,][g" κ’ x» ) 
Je = O, Hy 0, || κ. xz ος 
ο, ο, mI Le KP κ» 


then the permutation matrix for any subject m (starting with some arbitrary 
initial assignment, Jui) is given by 


Im = ТЈ: Jm. 


The first 36 matrices formed in this way assign every pair of responses to 
each pair of x imuli just once, The second 36 assign the same pairs of responses 
to the same pairs of stimuli with the orders reversed from those of the first 
36 assignments. Thereafter the same assignments are repeated with every 
succeeding 72 matrices. Thus, if N = 9, M can be any multiple of 72. Indeed, 
since PË will generally not be far from РЁ ,a satisfactory degree of counter- 
balancing can probably be obtained with only the first 36 assignments. 


REFERENCES 


ка icati istance geometry. Oxford: Clarendon 
Press, 1953. 


[3] Brown, J. S., Bilodeau, E. A., and Baron, M. R, Bidirectional gradients in the strength 
of a generalized voluntary response to stimuli Оп à visual-spatial dimension, J. exp. 
Psychol., 1951, 41, 52-61. 

[4] Busemann, H. The geometry of geodesics, New York: Academic Press, 1955. 

[5] Bush, R. R. and Mosteller, F. A model for Stimulus generalization and discrimination. 
Psychol. Rev., 1951, 58, 413-423. 


λε ος 
— В 


—- 


[6] 
[7] 
[8] 
[9] 
[10] 
[11] 
[13] 


[13] 


[14] 
[15] 
[16 
[17 
[18] 
[19] 
[20 


[21 
[22] 


[23 


[24 
[25 


[26] 
[27] 


ROGER N. SHEPARD 845 


Bush, R. R. and Mosteller, Е. Stochastic models for learning. New York: Wiley, 1955. 
Dunean, C. P. Development of response generalization gradients. J. exp. Psychol., 
1955, 50, 26-30. 

Estes, W. K. Towards a statistical theory of learning. Psychol. Rev., 1950, 57, 94-107. 
Frick, F. C. An analysis of an operant discrimination. J. Psychol., 1948, 26, 93-123. 
Gibson, E. J. Sensory generalization with voluntary reactions. J. exp. Psychol., 
1939, 24, 237-253. 

Gulliksen, H. and Wolfle, D.L. A theory of learning and transfer: I. Psychometrika, 
1938, 3, 127-149. 

Guttman, N. and Kalish, H. I. Discriminability and stimulus generalization. J. ezp. 
Psychol., 1956, 51, 79-88. 

Hovland, C. I. The generalization of conditioned responses: I. The sensory generaliza- 
tion of conditioned responses with varying frequencies of tone. J. gen. Psychol., 
1937, 17, 125-148. 

Hovland, C. I. Human learning and retention. In 8. S. Stevens (Ed.), Handbook of 
experimental psychology. New York: Wiley, 1951. 

Hull, C. L. Principles of behavior. New York: Appleton-Century, 1943. 

Kelley, J. L. General topology. New York: Van Nostrand, 1955. 

Messick, S. J. Some recent theoretical developments in multidimensional scaling. 
Educ. psychol. Measmt, 1956, 16, 82-100. 

Messick, S. J. and Abelson, R. P. The additive constant problem in multidimensional 
scaling. Psychometrika, 1956, 12, 1-15. 

Margolius, G. Stimulus generalization of an instrumental response as a function of 
the number of reinforced trials. J. exp. Psychol., 1955, 49, 105-111. i 

Noble, M. E. and Bahrick, H. P. Response generalization as a function of intratask 
response similarity. J. exp. Psychol., 1956, 51, 405-412. 

Pillsbury, W. B. A study in apperception. Amer. J. Psychol., 1897, 8, 315-393. 
Plotkin, L. Stimulus generalization in Morse code learning. Arch. Psychol., 1943, 
40, No. 287. J 
Rosenbaum, G. Stimulus generalization as a function of level of experimentally 
induced anxiety. J. exp. Psychol., 1953, 45, 35-43. | 

Thurstone, Т.Т. Multiple-factor analysis. Chicago: Univ. Chicago; Press, Wes 
Torgerson, W. S. Multidimensional scaling: I. Theory and method. Psychometrika, 
1952, 17, 401-420. ` н, 
Woodworth, R. S. and Schlosberg, H. Experimental psychology. New York: με А 
Young, G. апа Householder, А. 5. Discussion of a set of points in terms of their 


mutual distances. Psychometrika, 1938, 3, 19-22. 


Manuscript received 8/17 /56 
Revised manuscript received 4/29/57 


PSYCHOMETRIKA—VOL, 22, NO. 4 
DECEMBER, 1957 


THE RELATIONSHIP BETWEEN FACTORIAL COMPOSITION OF 
TEST ITEMS AND MEASURES OF TEST RELIABILITY* 


Joun W. Corron 
Donatp T. CAMPBELL 
AND 
R. DANIEL MALONE 


NORTHWESTERN UNIVERSITY 


For continuous distributions associated with dichotomous item scores, 
the proportion of common-factor variance in the test, H?, may be expressed 
as a function of infercorrelations among items. H? is somewhat larger than the 
coefficient α except when the items have only one common factor and its 
loadings are restricted in value. The dichotomous item scores themselves are 
shown not to have a factor structure, precluding direct interpretation of the 
Kuder-Richardson coefficient, тк.в , in terms of factorial properties. The 
value of rx. is equal to that of a coefficient of equivalence, Н , when the 
mean item variance associated with common factors ees the mean inter- 
item covariance. An empirical study with synthetic test data from populations 
of varying factorial structure showed that the four parameters mentioned may 
be adequately estimated from dichotomous data. 


Factor structure and test reliability, rr: , are closely connected in testing 
theory. In Cronbach's [2] joint treatment of these two topics he distinguishes 
four kinds of reliability coefficients: (1) stability, (2) stability-and-equivalence, 
(3) equivalence, and (4) hypothetical-self-correlation. Each, he says, 15 
uniquely characterized by the particular factor score variances assigned to 
error variance, c? . Thus the coefficient of equivalence is defined by a general 
formula, r,, = 1 — (02/02), where о? is the variance of total test scores and 
c; includes both the variance of specific factors for each item and the residual 
error variance. Stated differently, the coefficient of quor ae | ky 

zhi re indi tatus ο e indivi 
degree to which the test score indicates the s jupes us 


i i neral and group factors 
present instant in the genera group Re oe aig AE „е 


A definition of another coefficient, such as tha ce 
the same general formula with a new specification of error к Ea жес. 

In a later analysis of the Kuder-Richardson ([7], formula 20 
ater detail how total 


of equivalence, rx-r, Cronbach [3] has stated in gre mess 

*This study was supported in part by an Air Forged πο (Contract T ibe 
AF18(600-170), monitored [5 the Crew Research Laboratory προς 
Training Research Center, Randolph Air Force Base, ο ΗΝ КАН р 
is granted for reproduction, translation, publication, use ап рова i the Northwestern 
by or for the United States Government. Further support. SES g за ду ће Millar id 
Uni ity Graduate School. The computational Eram ata oth directly and 
ποτ cst Professor Meyer Dwass provided mathemati 


indirectly relevant to the paper. 
847 “ 


348 = PSYCHOMETRIKA 
La 


test variance depends upon the factor loadings of the individual items. He 
repeats his earlier argument that a general coefficient, o, of which rx_p is a 
special case, is the proportion of test variance due to common factors w 
a special assumption holds: the mean common-factor variance within items 
must equal the mean interitem covariance. This conclusion is weakened by 
the recognition that the assumption does not hold for an interitem correlation 
matrix with rank greater than one. Thus Tx-r is the proportion of common- 
factor variance only when there is but one common factor. Cronbach pre- 


sumes that а will closely estimate this proportion even with multifactored 
cases unless the test contains distinct clusters. 


'The present paper argues that a strict interpret; 
hypothesis requires a reanalysis of Cronbach’ 
requires a separate treatment of factorial struct: 
ever dichotomously scored items comprise the t 
a factor structure of such items and the related 
test exist for continuous distributions underlyir 
bution. These structures do not exist for the 
On the other hand, since the scoring of items an 


hen 


ation of the factorial 
S notions. This reanalysis 
ure and of reliability when- 
est under analysis. In brief, 
factor structure of the total 
ng each dichotomous distri- 


dichotomous distribution, 


A consequence of this reasoning will be the Specification of two distinct 
а coefficients: α for the continuous case, and «a OF тк п for the dichotomous 
case. The conditions under which « will equal the proportion of common- 
factor variance in the total test variance, А, will be examined, extending 
Cronbach's statement on the single-factoredness requirement, The coefficient 
αν ОГ Tx_p Will be shown to approximate a coefficient, of equivalence, Hg, 
for a test with dichotomously scored items. Thus, contrary to Cronbach’s 
belief, rx_» will never estimate common-factoredness, even for single-factored 
tests, and will estimate a coefficient 


of equivalence only under special con- 
ditions even more restrictive than single-factoredness, 


Theory of the Continuous Case 
"Thurstone's multiple-factor theory has 


as its basis the hypothesis ([8], 
рр. 69-74): 
(1) Sip = > а: + biy;, + €i Nip, 
where 


Si» = Standard score for person p on item û of a test C= 


s 1, 2, M x ,n), 
а; = loading of common factor m on item tme 9 ... ,1), 
Zn, = Standard score for person P on common factor m, 
b, = loading of item ¿ on a factor specific {ο item z, 


JOHN W. COTTON, DONALD T. CAMPBELL AND R. DANIEL MALONE 349 


Ξ 
e: ΜΙ — > ai, — 02 = loading of error on item z, 
m=1 


and y;, and πι, have definitions analogous to that for z,, . The tmn , Yip , 
and 7;, are independent of each other. The values of a;, , b; , and e; are 


parameters of the test. 
The item raw scores, X;, = σιδι, + u; , where u; and с; are the population 


mean and standard deviation of the z;, , may be summed to give total test 
Scores 


(2) Т, = РЕ Хь = X (> D T Σ о;у + > сет Mu . 


i.l m=] 
On a second administration of the test, the standard scores and total 


test scores will be given by 


(3) si, = > Qinap + bii» + ет , 
m=1 
and 
@ m= У ( "mS. + Labia + Leen + Ум. 
m=1 =1 tel = i= 


Equations (3) and (4) differ from (1) and (2) only in that Ni» is a random 
variable changing in value to »/, on the second administration of the test. 


The η’, are independent of all previous variables. | 
Following Wilks ([10], pp. 33-35) and remembering that the variances 


of the amp, Yip » Πιν and 7f, are all unity lead to 


@ ναῷ) = va (T) = È (È σα.) + iti t È oie. 


-1 
ach's [9] equations (2) and (3). 


Equation (5) is the same in substance as Cronb is | nd (3) 
ve administrations 15 


Similarly the covariance of total test scores on successi 
given by 


(9 cov (T, T") = = (E ба), t Σ gu. 


The ratio of cov (T, Т”) to var (T) is the hypothetical self-correlation 
ттт, of the continuous scores T, and 77 . It is also seen from (5) КА ү 
be the ratio of varianee contributed by common and specific factors to tota 


test variance κῶν 
ibuti " i actor to 
The relative contribution of each separate common or specific 


var (T) is 


(7) ΑΒ = (= astra) 7¥ (7); 


350 PSYCHOMETRIKA 


(8) ^ Bi = cibi/ var (T). 


Az and Bš may also be called the squared factor loadings of common factor 
a for the total test and of specific factor { for that test. 

The relative contribution of common factors to var (T) has previously 
been termed H^: 


(9) H° = = А» = >( 3 paci ue (T). 


mel i= 

H? may also be considered a coefficient of equivalence for the continuous 
case since 1 — H? includes both specific varian 

It may be of note to remark that with a centroid solution having only 
positive first factor loadings, Thurstone has shown У", din 0 form = 1. 
This implies that А 2 = 0 for m z 1 and, consequently, that H? = 43 centroid. 

Equation (9) must be revised to define H? in terms of parameters more 
readily determined than the a;„’s. By definition, r;; › the correlation coefficient 


between the item 4 and item j in a single test administration, is the expected 
value of (s;,5,;). This definition coupled with (1) 


yields 
(10) Ta = z Gris; G = p; 
(11) та = zm 9:0 + bi + e? = 1, @ = p. 
The expressions (10) and (11) in turn imply that (5) may be rewritten 
(12) var (T) = > БЭ «т. 


i=l j=l 


Equation (12) will be used to restate the denominator of (9). Reanalyzing 
the numerator of that equation, define r* as the correlation coefficient 
between the items and j on successive administrations of the test, excluding 
the contribution of specific factors. In this са 


se 
(18) rt = Τη = > айгайы, G = p, 
and 
(14) τῇ = hî = Σ αἲ, Š ey, 


where hj is the proportion of common-factor varianc 


e to total variance for 
item z. Equations (13) and (14) lead to the conclusio; 


n 


(15) > ( 2 ва) = > Σ T 


m-l Vic i-i 


JOHN W. COTTON, DONALD T. CAMPBELL AND R. DANIEL MALONE 351 


Equations (12) and (15) may now be employed to rewrite (9) as 
(16) H = Σ У осот L Loota. d) 
$-1 j=l t=? j=l 
This equation is directly applicable to situations in which the c; , o; , and 


; are known but the factor loadings themselves have not been determined. 
The coefficient а for the continuous case may now be compared with H^. 


` Cronbach's equation (24) ((3], p. 305) in our notation is 


n 
s= L 


tel jel 
ini 


(17) a= 


For æ to equal H°, the proportion of common-factor variance to total test 
variance, the numerators of (16) and (17) must be equal. This requirement 
reduces by means of (13) and (14) to 


(18) Уем È Lew = 0 DÛ Lew, Gs 
i 1 = {51 jel 


i=l i=l j= 
Simple algebraic manipulations lead to an equivalent condition: 
(19) 5ou-(EÀE«e)/^-o 
i=l i=] j=l 
It should be noted that the case £ = j is not excluded in equation (19) as it 


was in (18). 
Finally by (14) and (15), (19) leads to 


(20) ae [> can — (> eae) / [ = 0. 


m=1 
ent to the assertion that the standard deviation of the 
о the assertion that @im = 
ver m. Combining this 
he reduced correlation 


But this is equival 
c,a,, must be zero for every m, or equivalent t 
km/o; , where k, is constant over ¿ but may vary o 
result with (13) and (14) gives a general term for t 


matrix: 
тў = È r/o. 


m=1 
That matrix would then have rank 1, and a single common-factor solution 


with loadings equal to 
^ | Уз ki / σι. 
m=1 
of a single common 


h’s requirement 
expanded to require that each item 


In summary of this point, Cronbac 
ely proportional to its с; . Although 


factor for equality of H^ and α may be 
have a single common-factor loading invers 


PSYCHOMETRIKA 
352 


i ndition is both necessary and sufficient, H? and α may be almost 
ets rithout satisfaction of the condition. For estimation procedures 
E d dichotomous data, the restrictive assumption с; = c for all û 
ur pue 5 necessary. In that case σ will replace с; and с; where they occur 
i у through (20). In partieular H^ will be defined in a manner almost 
ed des to Jackson and Ferguson's equation for Ттт, ([6], eq. 67) 


(21) H = >: Xa / b S fs 


i=l] j=l i=1 


in place of that given in (16), and α will be given by 


n ar n n 

(22) aer lA ин ie. 
(ivi) 

Similarly the condition (20) required for equivalence of H° 

to a requirement of a single common factor with a constan 


Theory of the Dichotomous Case 
Suppose that a population proportio 
n-item test previously discussed, Can wi 
responding to 0 (fail) and 1 (pass) scores by an expression having the form 
of (1)? All persons failing the item will have s,, = —P,/A/P,1— Pj; 
all persons passing the item will have s,, = (1 — P.)/VP.d = Pj. Now 
any nontrivial application of (1) implies the existence of at least one nonzero 
йг two distinct values of z,, ; 8 nonzero e, , and two distinct values of η,» . 
Consequently (1) requires that there be at least three values of s,, in the 
simplest case, occurring when a,,, = 2-6, To» = 1, and ηι, = 1. 
Since the z,, and Nip are independent, 8 
various persons, 4/2, 0, and — 4/3 


and а will reduce 
t loading equal to 


n P; of persons pass item 4 of the 
ο describe the standard scores cor- 


i» Will take on one of three values, for 
i» Values given above. 
But this simplest case comple n position that dichoto- 
mous item scores have a factorial str own. The factorial hy- 
pothesis of (1) will always imply hree or more different 
score values on each item. One should never, then, Suppose that dichotomous 
distributions may be generated by (1). Correspondingly, one should never 
factor analyze a product moment correlation matrix based on 0 and 1 scores 
(1.е., a matrix of phi coefficients). Although this 
a product of a “factor” matrix by its transpose, that “factor” matrix will 
have no direct application to anything in factor theory. The misbehavior 
of factor loadings based on phi coefficients has been previously observed 
[5, 9]. The theoretical basis for this misbehavior has not been treated in the 
present manner. 


JOHN W. COTTON, DONALD T. CAMPBELL AND R. DANIEL MALONE 353 


In denying the factorial interpretability of dichotomized scores as 
basie data, we do not completely foreswear an interest in the dichotomous 
ease. As a later section will show, dichotomous data can be used in such a 
way as to estimate parameters, such as H^ and a, which are characteristic 
of the underlying continuous distributions. 

We now turn to the matter of reliability determinations for total test 
scores obtained by summing pass-fail item scores. The hypothetical self- 
correlation, rrr’, , may be defined as 


(28) rrr, = cove (T, T)/ var, (Т) = m > cov du / vir Ф (T), 


where the Ф subscripts are employed to indicate that dichotomous item 
scores are employed, causing all interitem correlations to be phi coefficients, 
Ф,;. 


Use of the facts that 


os, = VP. (1 -- Р) and σι, = VP;(1 — P); 


helps (23) become 
Q πως XX VPM = PEG = Py, / vare (0). 
i=l jel 


This hypothetical self-correlation may be called a phi coefficient analogue of 
Ттт, . Its value will be less than rrr and, unlike rrr: , it will be sensitive to 
changes in P; from item to item. 

To obtain a coefficient of equivalence, Hs , for the dichotomous case, 
any quantity attributable to specific factors in the underlying item scores 
will be excluded from the numerator of (24). This is done by defining H; as 


e m = È È pG = PPT Pe D, 
i=l ἐπὶ 

where ФА = ®;; when ὁ # j, and Ф = @(h;) when ? = j. (h4) is simply 

the phi coefficient obtained by entering the Chesire, Вай, and T s Ш 

tetrachorie correlation charts backwards, using 7;; = h: and P, to оп "em 

the fourfold table necessary for computation of the phi coefficient associate 


- structure 
1 ἃ 2 is asure of common factoredness, but factor s 
with that A; . Πα is not a measu ΤΕ ie the denied атанан 


lus the P, and P; have complete control over it. pa 
чырын for total test scores based on pass-fail data when the . ө 
of items has been reduced to exclude specific factor contribution 1 E шр 
lying distribution. H. 2 may also be called the correlation between test 
are matched item by item for common-factor structure. PET 
An analogue of e, а OF Τκ-π, has already been mentioned: 


> Y УРД = PP — P), / var (7). 


i=l ἐπι 


n 
(20) αφ = тк-в = Rm 


354 PSYCHOMETRIKA 
Ὁ 


A comparison of (25) and (26) shows that rz κ isa coefficient of equivalence 
when " 
(27) Pl — P265 = VP — PP С Руш, ' G = i. 
This condition is analogous to requirement (18), and both are equivalent 
for the case of parallel tests to Jackson and Е erguson's [6] assumption that 
the mean interitem covariance between tests be equal to the mean interitem 
covariance within tests. I 
In the special case where P,(1 — P.) is constant over all û and c; is 
also constant over all z, the one-to-one Correspondence between r;; and @;; 
will imply that the modified condition (20) following equation (22) is necessary 
and sufficient for rx_p to be a coefficient of equivalence, 
The Estimation of H°, Н 
Α crude estimation procedure for the four parameters, H^ , Hj , о, 
and r5 , is to replace all individual components of (16), (25), (17), and 
(26) by their estimators. Speci ising in the application of (16) 
and (17) when only dichotomous data are availabl i 
assumption about the σι 
tribution is made. 
Obviously the P, and P, values of 
lation may be assigned independently of the c; and 7; values for the underlying 
continuous population. Then the pass-fail splits defining the Ф,; will be 
independent of the 7; and c; , depending only on the P, , P; , and» 
arameter values from the di 
alone will give any information about the e, and о; . 


ће multivariate dichotomous popu- 


wis 


chotomous population 


al the assumption c; = c for all i 
seems most Satisfactory, Therefore, it will be em- 
ployed throughout the remainder of this 

The estimation of Н? 


improvement in the 


test which could be obtained by continuous rather than dichotomous scoring 


| 


JOHN W. COTTON, DONALD T. CAMPBELL AND R. DANIEL MALONE 355 


of each item. With such materials as tests of physical strength, it may be 
quite feasible to increase substantially the coefficient of equivalence by a 
change in scoring methods without changing the test items employed. (c) The 
sensitivity of гк-к to variation in P; is well known. Many attempts to control 
test homogeneity have centered on control of the P, . Comparisons between 
H° and Hš and between а and тк-к will serve to emphasize that a test with 
high rg; because of homogeneous P; may nevertheless have less factorial 


homogeneity than a test with a lower rx-z . 


A Sampling Study 


'The unit of data for this study is a sextuplet of numbers representing 
item scores on a six-item “test” given to a hypothetical “subject.” Eight 
samples of 500 such subjects’ item scores were obtained, one sample from 
each of eight populations defined by specifying the factor loadings of the 
items in the test associated with that population of scores. The factor loadings 


selected are presented in Table 1. 
TABLE 1 


Item Factor Loadings and Test Parameter Values for Bight Contrived 
Populations of Scores from Six-Item Tests 


Loading for Item Number 


Population x 2 3 ` 5 6 
.TDh6 «Ττνό .7706 «TTS Т6 -7736 
a „8660 .8660 .8660 ο o o 
aie o ο «8660 „8660 8660 


о 5883 .588} ο 588% -5884 


XV ayo 
513 .588. о «568 «δι O «388; 
811 .8321 .8321 0 .8321 .8321 ο 

V a12 o ma бй 0 مء‎ «1161. 
e «161 
213 261 ο  .Mé аб ο Dr 

«5291. 

TE sn .5291 .5291 .5291 91 .5291 «3 

D «3180 
VII ail .3180 «3180 3180 «3180. 3180. 
ο 
sl садто иг 0 0072 κ a 
«9036 - 
vor ai2 o 2236 «236 0 
+2236 


x o «2536 +2236 0 


356 PSYCHOMETRIKA 


Given the a;,, and е; values for any population, w 


e employed a table of 
random normal numbers ([4], Table 2) with u = 0, ë = 1, to obtain a 
different set of 500 £m» or £; values for each factor and for error, Equation 


(1) was employed to obtain Si» values for each person on each item. 

After all s;, values had been obtained for a test, the 500 scores for each 
item were dichotomized on the basis of a pass-fail criterion. This permitted 
determination of tetrachoric correlation coefficients for use in computing 
F and å, estimators of H° and a. The dichotomized Scores were also employed 
in calculating F? and 7. , . In the determination of Hiandry , ; the sample 
proportions were used in place of population P, ы introducing Some inaceuracy 
in their values. 

The dichotomization of item scores was performed twice, once with the 
sample proportion of 1 Scores, p, , fixed at .50 for every item in every test 
and once with P: variable from ite atter case p; ranged 


from .25 to .75, with p, = .25 ‚ Ps = .75 for every test. Each 
of the dichotomizations led estimates of parameter values. 


Results 


Table 2 presents à comparison of sample and Population values of all 
four coefficients, both for the fixed p, case and the variable р; case. Within 
the limits of the samples employed, our estimators are quite satisfactory. 
The mean constant, error of any estimator across eight populations never 
exceeds .002 in absolute values, and no individual coefficient is in error by 
more than .058. 


“45k απο 897 50h узу ЛЮ „ацы „оз 
+698 .τοῬ 990 601. , 76 Το „бе 68 
TB T? a Sa Л] Лә „668 „616 
ΠΒ 2S μὴν. мы ы E TT 
389 .398 587 


"542 36 до «360 .356 
-336 .30 | ag 


:36 356 309 «416 „251 


| 


[8] 
[9] 


[10] 


JOHN W. COTTON, DONALD T. CAMPBELL AND R. DANIEL MALONE 357 


In single-factored populations I, VI, and VII, H° and а are equal, and 


I$ and ткр are equal for the homogeneous p; case only. In some of the other 
populations these coefficients differ markedly, indicating that a and rg_p 
are poor approximations to coefficients of equivalence in those cases. A 
large discrepancy between sample values, Й? and à, or Hj and 74. , appears 
indieative of multiple-faetoredness or of wide dispersion of a single common 
factor's loadings. The converse statement is not true. 


Unlike H? and a, which are invariant under different dichotomizations 


of the same underlying scores, H£ and тк-к are reduced by introducing 
heterogeneity of the p; . The estimators Hj, and 74, also show this effect. 


REFERENCES 


Chesire, L., Saffir, M., and Thurstone, L. L. Computing diagrams for the tetrachoric 
correlation coefficient. Chicago: Univ. of Chicago Bookstore, 1933. 

Cronbach, L. J. Test "reliability": its meaning and determination. Psychometrika, 
1947, 12, 1-16. 

Cronbach, L. J. Coefficient alpha and the internal structure of tests. Psychometrika, 
1951, 16, 297-334. 

Dixon, W. J. and Massey, F. J., Jr. Introduction to stati 
McGraw-Hill, 1951. 

Ferguson, G. A. The factorial interpretation Οἱ 
6, 323-329. 

Jackson, R. W. B. and Ferguson, G. A. Studies on the reliability of tests. Toronto: 
Department of Educational Research, Bulletin No. 12, 1941. εν 
Kuder, G. F. and Richardson, M. W. The theory of the estimation of test reliability. 
Psychometrika, 1937, 2, 151-166. 
'Thurstone, L. L. Multiple-factor ana 
Wherry, R. J. and Gaylord, R. H. The 
to factor-pattern. Psychometrika, 1943, 8, 247-269. : . Р 
Wilks, S. S. Mathematical statistics. Princeton: Princeton Univ. Press, 1943. 


stical analysis. New York: 


f test difficulty. Psychometrika, 1941, 


lysis. Chicago: Univ. of Chicago Press, 1947. 
concept of test and item reliability in relation 


Manuscript received 1/3/56 
Revised manuscript received 4/24/57 


| 
a 
x 
| 


PSYCHOMETRIKA—VOL, 22, NO. 4 
DECEMBER, 1957 


THE USE OF CONFIGURAL ANALYSIS FOR THE EVALUATION 
OF TEST SCORING METHODS* 


H. G. OSBURN 


SOUTHERN ILLINOIS UNIVERSITY 
AND 


ARDIE LUBIN 
WALTER REED ARMY INSTITUTE OF RESEARCH 


A method based on configural analysis has been given whereb E 
idity. 


scoring techniques can be evaluated to see if they have optimal у. 
Configural analysis has also been used to show how three well known item 
scoring techniques, multiple regression, total score, and multiple cut-off, imply 
(for optimal validity) certain conditions on the answer pattern means. The 
method is illustrated by a worked example. 


The purpose of this article is to demonstrate a method whereby test 
scoring techniques can be evaluated to see if they have maximum validity. 
In a previous paper [3] a technique of pattern scoring of test items for the 
prediction of a quantitative criterion was presented. The basic notion used 
was that of a configural scale, defined as follows: given a test of ¢ items and 
a quantitative criterion, form all possible answer patterns and assign to each 
subject a score which is the mean criterion score for all subjects in his answer 
pattern. This set of scores is called a configural scale. It was shown that, in 
the analysis sample, of all possible ways of scoring the ¢ items, the configural 
scale provides the best least squares prediction of the criterion. It was further 
shown that the configural scale could be represented exactly by a polynomial 
function of the item scores if the items are dichotomous. However, the concept 
of the configural scale is not restricted to dichotomous items. If the items 
are polychotomous, the only change is that the number of possible answer 


patterns will increase. 
Theory 
A. The equav model 


The configural scale is define 
can be represented by a polynomi 
of the configural scale and polynomia 
Table 1. In Table 1 the answer pattern 

*We are indebted to Professor James G. Taylor 

359 


d as the set of answer pattern means, and 
al function of the item scores. An example 
1 equation for two items is given in 
s are designated by A, for the answer 


for his helpful suggestions. 


960 ΡΒΥΟΗΟΜΕΤΗΙΚΑ 


TABLE | 


The Configural Scale and Equav Scores for Two Items 


Answer pattern Answer pattern Criterion ^ Equav scores 
frequency means 

Items E c u 
τ Ὁ Ж Ὁ κα 
m YY "o 6 x Ὁ 5 4 
Ay NY "| ὃ rra ta 
№ ἜΝ 5ο & Y p s = 
12 N N 12 C2 1 - -l 1 


Equav regression coefficlents SEE CE 


pattern containing all yes responses, À, for the 
πο to Item 1 and a yes to all other items, 
to Item 2 and a yes to all other items and 
items to which the subject has responded 
mean of each answer pattern is desi 
and C, are the frequency and eriterio 
pattern. 

Each item is scored +1 for a yes response and —1 for a no response. 
Let w, designate the score for the kth item. Let X ; be a polynomial term, 
where j indicates the items which form the term. The Score on X; is obtained 
by actually multiplying the item sco 2 


res together; e.g., X, = u, ‚ Xin = Utley 
P om = шшш ‚ and so on. These polynomial terms are called equav scores. 
As will be explained in detail later 


| | d » equav is used to refer to the model for а 
factorial analysis of variance with equal cell frequencies. 


The matrix M as shown in Table 1 is the set of equay scores whose 
rows are the 2‘ answer patterns and Whose columns are the 2' polynomial 
terms. The general entry, m,; , equals (— 1)*, where g is the number of common 
items in the rth answer pattern and the jth polynomial term, X. o is ἃ dummy 
term with a score of +1 for every individual, to allow for the constant term 
in the polynomial equation. 


In Table 1 the equav predictor for the criterion score of the ¿th individual 


answer pattern containing а 
A, for the answer pattern with a no 
in general, A, where r designates the 
πο. The frequency and the criterion 
gnated by the same subscript; e.g., no 
n mean, respectively, of Ag , the yes-yes 


is 
(1) €, = ®Х»  dXa + ax, + sus, 
where Û; is the predicted criterion score for the 

X;; is the score of the ith individual on th 


d, is the least squares regression coeffic 
term. 


ith individual, 
€ jth polynomial term, i 
ient for the jth polynomial 


3 — ей 


 — 


t 


H. G. OSBURN AND ARDIE LUBIN 361 


The exact solution for the 2° regression coefficients сап be obtained as 
follows. Let Z be an N by 2‘ matrix whose general element z;; is the equav 
score of the ith individual on the jth polynomial term. Z is an expanded 


form of M where the rth row of M is repeated n, times. Let 6 be an N-rowed 


column vector, where e, is the criterion score of the ith individual. Let d 
ion coefficients. Then 


be the 2‘ X 1 column vector of regress 
(2) q = ος) ze 
hich gives the exact least squares fit. 


is the set of regression coefficients w 
vidual will be the mean of his answer 


The predicted score Û; of the ith indi 


pattern. ' 
Let n be the diagonal matrix of answer pattern frequencies. Then 

(3) Z'Z = ΜΜ 

and 

(4) 26 = M'nC, 


where C is the 2° by 1 column vector of answer pattern means. Substituting 


(3) and (4) in (9), 
(5) а = (MM) Мб. 


Since M = M’ and M = 31, 
(9 4 = Мам" = М0 = 2'MC. 
Thus, each regression coefficient is equal to the algebraic sum of the 2' 
criterion averages divided by 2“. 
Scoring each item alternative 
level factorial analysis of variance mo 
have equal frequencies. For this reason 
this method of scoring the polynomial terms. In previous papers Π, 3] the 
items have been scored +1 for a yes response and 0 for a no response. This 
leads to considerable difficulty if item scores are arbitrary. A reversal of an 
item score involves nonlinear transformations of the polynomial terms which 
alter the absolute values of the regression coefficients. } 
Equav scoring has certain algebraie advantages (such as M = М' and 
М = 9-11). The most important advantage is that the absolute values 
of the regression coefficients are invariant no matter what item scores are 
reversed. The proof follows: 
Reverse the equav score on the k 


+1 or —1 is exactly analogous to à two- 
del when all cells (answer patterns) 


the term equav is used to denote 


th item. Every —1 becomes +1, every 


+1 becomes —1. In other words, — 1 is substituted for ш. This amounts 


Ρ si ynomial 
to multiplying each column of M by —1 if ш, appeals a un E 
term. In general, reversing any set of item scores 15 equivalent to = 


the appropriate columns of M by —1. 


962 PSYCHOMETRIKA 


Let H be a 2° by 2° diagonal matrix containing —1 in each diagonal 
cell corresponding to the appropriate column of M. All other diagonals 
contain +1. Let P be the M matrix after the item scores have been reversed. 
Then P — MH. Let e denote the set of regression coefficients for the poly- 
nomial equation using the reversed scores. 

Substituting in (5) 


(7) e = (P'nP)" P'nC; 

(8) e = (H'M'nMH)"H'M nC; 
(9) e = ΗΜ ΜΗ ΉΜαΟ. 
(10) e = Η Μ:6. 

Since 

(11) Н" =H and MO = 2M, 
(12) в = H(2'MÓ). 

Therefore 

(13) e — Hd. 


Premultiplying d by H simply reverses 
This proves that the absolute 
under reversal of item scores, 


the sign of certain of the d coefficients. 
values of the equav coefficients are invariant 


Special case of linearity where all the co- 


(14) б = w + шх, 


Let Z, be the N 
of Z. Then 


+ wX, + WX + С 
by t submatrix formed by taking the first ¢ + 1 columns 


(15) ш = (ZIZ)? zie, 


where w, is the column of ¢ + 1 linear regression coefficients. Let K ‚ be the 


2' by t + 1 submatrix formed by taking the first t + 1 columns of M. Then 
Z/Z, = Kink, and ZC = KinC. Therefore oae 


H. G. OSBURN AND ARDIE LUBIN 363 


(16) w, = (КК) Кб. 


Note that since K, is rectangular, no simple inverse exists and equation 
(16) cannot be further simplified. 


В. Restrictions on the answer pattern means 


Any method of scoring the { items which yields optimal validity in the 
population and yet uses fewer than 2° parameters imposes certain restrictions 
on the population answer pattern means. In this paper, restrictions imposed 
by three well known test scoring methods: multiple regression, total score, 
and multiple cut-off, are considered. 

Table 2 summarizes the necessary and sufficient conditions (in the 
mathematical sense) for each of the three scoring techniques to yield optimal 
validity. The restrictions on the equav coefficients amount to definitions in 
the case of multiple regression and total score. From these definitions a 
number of restrietions on the answer pattern means can be derived. 


TABLE 2 


Conditions for Optimal Validity 


Scoring Method 


Multiple Total Multiple 
Regression Score Cut-off 
Equav All non-linear 1. All non-linear Only one co- 
Coefficients coefficients are coefficients efficient 
zero are zero differs from 
the others in 
2. All first-order absolute value 
coefficients are 
equal 
Answer The sum of com- 1. The sum of com- Only one mean 
Pattern Means plementary plenentary answer differs from 
answer pattern pattern means is “the others. 
means is equal equal to a con- 
to a constant, stant, 2d 
E ο 
5 2. All answer pat- 


terns whose sums 
of item scores 
are equal have 
equal means. 


One of the most useful restrictions on the answer pattern means in 
the linear case is given in Table 2. First, let us define complementary answer 


patterns. Answer pattern A, is complementary to A,r , if and only if, every 
item response in A, is reversed for Α,, . For example Αιξ (NNYY) is the 


complement of Ass (YYNN). In our four-item case 
Ω. = а, + а, Х, + d,X; + daXs E dX, , 


(17) 
d, + dX! + ах + d,X; + АХ. 


(18) C. 


lI 


364 PSYCHOMETRIKA 


For the equav model, X, + XI = X, + X; = X, + X: = X, + X: = 0. 
In general (X + X") = 0. Therefore, 


(19) Qu + ë = 2d, + (X E X?) (d, T d, + d, + а) = 2d, . 


The additional restriction for the total score case, that all answer patterns 
with the same total score have equal means, can be derived as follows: By 
definition, the first-order coefficients are equal, i.e., 


d, = d, = d, = d, = d. 
"Therefore, 


б, = d, + iX, + dX, + dX; + dX, 
= d + U(X, + X, + X, + X). 


Thus all answer patterns with the same sum of item 
I š Scores (X. X, + 
X, + X4) will have the same mean. жт 


The basic definition of the multiple cut-off is that only two scores are 


answer pattern are assigned one score; the 
tterns are assigned the other score. This 
ptimal validity all except one of the answer 
pattern means should be equal. Without losing generality, it can be assumed 
that the шш (os τ T - Let C denote the constant means for all other 
answer patterns. From (0) for caleulating the regressi ient 

means it follows that πο μμ 


(20) 


(21) d, = [Co + (2 — nO, 
(22) d, = (Č, — C)/2', 
Q3) d, = (6, — бууо“, 
and in general 

Q4) d; = (6, — бууо!, 


"Thus, in the multiple cut-off case, all 


coefficient; Il ha 
absolute valie. 8 but one will have the same 


C. The F ratio tests 


How any hypothesized relation of a s 
the criterion can be tested by means of the 
the usual assumptions of normality and h 
теб 


Decifie set of item interactions to 
F ratio is shown in [3]. Of course, 
omogeneity of variance must be 


"The general F ratio test is as follows: let 


: ) Ίο be the configural validity, 
т be the validity of any specified scoring met, d 


hod, and о, be the number of 


H. G. OSBURN AND ARDIE LUBIN 365 


sample statistics that must be calculated. Then 
2 __ 2 N 87 9! 
» 9069) 
en 1 — «M -- υυ 
An even more general formula is given in ([3], equation 34). 


Worked Example 


In order to illustrate the method an example was constructed. Five 
hundred scores were drawn from a table of normal random deviates. These 
scores were then transformed so that the universe mean was 5 and the uni- 
verse standard deviation was 10. The frequencies for each answer pattern 
were calculated by fixing the p values of the four items at p, = 3, Po = 4 
рз = .5, p, = .6 and assuming all items to be statistically independent. 
'The 500 scores were assigned at random to the sixteen answer patterns 


according to predetermined frequencies. 


To the artificial data described above, a linear systematie component 
was added. The following arbitrary values were assigned: d, = 22, d; = —1, 


d, = 5, d, = —7, d, = 9. Using these values in (14) gave the systematic 
component that was added to each answer pattern mean. The column in 
Table 3 labelled C gives the answer pattern mean obtained by these com- 


putations. 


TABLE 3 


Basic Data for Worked Example 


- ^ 
2 RS ў 
Answer Pattern п Σο zc guo Deviance ο d ο T 
5 2 0 
^o YY 18 605 22,581 20,334.722 — 2,246.277 33.611 264423 Ы d 28 
А нүү 42 1,556 61,04 5766.095 3467.905 37.048 -1.5 з 
К үңү 27 604 17,488 13,511.703 3936.296 22.370 5.145 Ν 
à 47.833 -6.230 45.745 3 1 
Ay yyy 18 861 42,275 41,184,500 1,090.500 E c ιδ 
^s үүүн 12 168 3,836 2,352.00 1,484.00 14.000 9.541 M 
^ NYY 6з 1,725 52,863 47,232.142 5630.857 27.381 к= сва 
A2 иу 42 1,997 100,022 94,952.595 5,070.405 47.548 = т € 7 
я NYYN 28 505 11,821 9,108.02 2712.964 18.036 P ΚΝ e 
b үнү 27 947 35,765 33,215.148 2 35.074 p^ Fd a 
A, REI 18 ϐ 1,972 392.000 1,580,000 4667 Lt oo a 
Ke, үүн 12 291 7,813 7,056.750 456.250 24.250 b e 
: =. . 
"E У 63 20321 91,363 85,508.587 5,854,413 pe Me TE 
‚738 823.714 4,914.286 . E 
amm - às ae i 6.714 30.214 -.863 29.891 3 0 
A NINN 28 846 27,548 25,561.285 1,986. ὡς cM ise 12 
ka YNNN 18 313 6,495 5,442.722 1,052.278 17.3! i ge а n 
να NNNN 42 927 24,471 20,460.214 4,010.786 22.071 °2 
1234 
ad 27.872 33.612 423.216 
Total 500 13,936 513,126 388,424.192 124,701.808 


хСс = 463,603.728 
ET = 1100 


a2 
ĝa €? = 463,601.606 
хб = 13,935.300 Е РЕ РР στο = 36,011 


км? = 18 ЕМС = 861 


ЕМ = 18 


366 PSYCHOMETRIKA 


Step 1—Calculation of the configural validity 


First, a one-way analysis of variance is computed. The column labelled 
deviance in Table 3 contains the sum of Squared deviations (sum of Squares) 
about each answer pattern mean. The sum of the 16 answer pattern deviances 
is W, the within group sum of squares. The column labelled (3 суум 


sum of squares. 


TABLE 4 


Analysis of Variance 


Source df Deviance Mean Square 


Between 


Answer Patterns 15 76,358.020 5,090.535 
Within 484 48,343.784 99.884 
Total 499 124,701.804 

? 2. +612 F = 50.964 


mue c αμ 


These figures along with T, the deviance (sum of Squares) about the 
total mean, are given, in the usual analysis of variance form, in Table 4. 
The formula for the configural validity is 


С... — _76, 358.020 P. 
(26) 7: = B/T = 124, 701.801 = .612395, 


The test of significance is 


2 t 
= κ’ n. JN — 2) P1 (νο f 
(27) ы ( - JE τ)  V887675/ 15) = 50.964. 


Since the .001 confidence level is 2.577, the 


s configural validity is obviously 
greater than zero. If the F ratio was insignificant, the analysis would be 


Would give a better-than- 


Step 2—Calculation of the polynomial regression Coefficients 


In Table 3, the column labelled d contains all 16 polynomial regression 
coefficients. Each coefficient was computed by addin 


nee & together the 16 means 
(appropriately signed) and dividing by 9’. The sign of each mean is given 


by (—1)°, where g is the number of common items in the subscripts of the 
regression coefficient and the mean. 


i. 
gE. eee 
T 


 —— es ———— 


H. G. OSBURN AND ARDIE LUBIN 367 
For example, 
T ox > Е x = 4 - x 
dy = X, [Ó, + б, + б, + б, + б, + Cre + би + Cu + бы + бы 
422. 
22202 26.423, 


F бы sb Cin + Cu + Qu + Qu + σι] = in 


û, = F 6, — û, + Û, + б, + бо > би — δν ¬ би + ба+ бы 
da Gn Q Pm = Gaal ЕТ 
tig Û ο НЕРАЛ ie Ri s = im 
о + Gan + Con = Qa — e 
and so on. : 
ed in order to see which scoring method 


These coefficients can be scann 
seems to give an optimal prediction o 


For example, if, in the population, the relatio: 
criterion were exactly linear, the only nonzero coefficients would be do , dı 5 


d, , d, , and d, . In any actual sample, the other nonlinear coefficients would 
be small but not exactly zero. So one can simply look at the five linear co- 
efficients to sce if their absolute values are larger than any of the other 
coefficients. Another such test is to check the frequency of negative values 
among the eleven non linear coefficients. In the linear case, the true probability 
of a negative value is 1/2. If these crude combinatorial tests do not contradict 
the hypothesis of linearity we proceed to the next possibility—that the total 

score will give maximum validity. 
The total score will give maximum validity in the population md all 
„|= 


the conditions for linearity are met and in addition, | d, | = [αι | = Р 
| d, |; i.e., the absolute values of the first-order coefficients are equal. Again, 


a crude check of this can be made in the sample by seeing if there is wide 
variation among the absolute values of the first-order coefficients. 
In Table 3 it can be seen t ficients (ἄν, di , ὧν, dh, 


f the criterion with the fewest parameters. 
n between the items and the 


hat the linear coe 
]ues are larger than those 


and d,) meet the first condition; their absolute values © d 
of the nonlinear coefficients. Also the second condition 1$ mel; the ratio ο 
h is not significantly different 


negative nonlinear coefficients was 4/11, whic : e E 
from the expected value of 1/2. Therefore, the hypothesis of linearity is no 
contradicted. ἢ 

ents were examin 


Next the first-order coeffici 
was likely to have maximum validity. If so, the 


ed to see if the total score 
first-order coefficients (di , 


968 PSYCHOMETRIKA 


d; , d, , and d,) would be approximately equal. But d; was more than six 
δν the absolute value of d, . So it is unlikely that total seore would have 
i validity. 
μμ ------ earlier the above crude tests can be used if the research 
worker has no definite hypothesis about the optimal scoring method. However, 
to demonstrate conclusively that linear Scoring is sufficient. for optimal 
brediction it is necessary to show that the multiple correlation, Re.1,2,3,4 , 
does not differ significantly from the configural validity η, . To demonstrate 
conclusively that linear scoring is preferable to the total score and the multiple 
cut-off score, it has to be shown that the total score validity and the multiple 
cut-off validity are significantly less than the configural validity. 


Step 3—Calculation of the multiple correlation 


In Step 2, the linear hypothesis passed the first erude tests. Το compute 
the linear multiple correlation Re.1.2.3,4 first 6, , the predicted criterion 
score for the rth answer pattern, 


was calculated. (This may not, be the most 
convenient method, but it does show any large deviations from the linear 


hypothesis. For a perfect linear fit, C, = εν) To obtain Û it was first, neces- 


sary to compute the linear regression coefficients; i.e., w, › Wi, We , Шз, Ws 
from (16). 


The matrices (Kink,), (KInK,)^! 
"The regression coefficients are in the 
applying the equation @ = K 
in the Û column of Table 3. R 


and (Kin) are presented in Table 5. 
column w. Then Ó was obtained by 
«0. The predicted criterion means are presented 


*-1,2,5,418 equal to т; › the zero-order correlation 
TABLE 5 


Computation of Linear Re. 


gression Coefficients 
SS c oic. ΜΗΝ 


K nk 


Xo 500 -200 -100 0 10 x 


Ὁ 2.548 952 417 O -417 13,936 26.451 
x -200 500 40 0 -4ο x, 952 2.381 о о 0 76,190 71.466 
X, -100 40 500 0 -20 X, 417 0 2.083 0 ο -278 5.227 
% ο ο 0 500 ο x, ο ο ϱ 2.000 O  -3,070  -6.140 
ep = Ш ο ος = μας юн 
Sum 300 300 420 500 540 3.500 3.333 2.500 


2.000 1.666 11,694 33.465 
between C and C. 
This was computed by the well known formula 


А (N 2:06 Ус убу 
(28) у X: = Су OW 20 (EOT 


7 


— — € 


H. G. OSBURN AND ARDIE LUBIN 369 


Column У) С in Table 3 contains the sums of the criterion scores for 
each answer pattern. To obtain 2} CÓ, each Û, was multiplied by the > C 
for the rth answer pattern, and the result was summed over all patterns. 


Similarly, 


Σ = Ya = 5500 md 0= Уһ, = С. 
rel r=1 


Substituting in (28) from the data in Table 3, 


2 [500(463,603.728) — 13,936(13,935.300)] 


Te = 1500(513,126) — (13,936):][500(463,601.606) — (13,935.300)] ~ HB. 


Step 4—Comparison of the multiple correlation with the configural validity 


Applying (25), 
.. (612 -- 608) (500 _ 16) — + ño: 
I ( 388 16—5/ - Заль 
Since, for 11 and 484 d.f., the .05 level is 1.750, ће F test indicates that the 
multiple correlation does not differ significantly from the configural validity; 
i.e., the multiple regression scoring method yields optimal validity. 


Step 5— Calculation of the total score validity 


In order to rule out conclusively the total score hypothesis, the zero- 
order correlation r, was computed between the total score and the criterion. 
In general, the total score, T', is equal to the number of yes responses for all 


rder coefficients plus the number of no responses 
А e column labelled 7 


The usual formula 


items with positive first- $ 4 
for all items with negative first-order coefficients. Th 
in Table 3 gives the total score for each answer pattern. 
for the squared correlation was used. 

iy Der = О 
Ë мут Cow ыт О 


Using the data from Table 3, 

[500(36,011) — (13,936)(1,100] з 
rt = 15000518120) — (18,980) 150002,800) — (1,100) ] 
the configural validity 


489. 


Step 6—Comparison of the total score validity with 


Applying (25), 
612 — 489) (500 - 18) = 10.960. 
p 388 16 — 2 


š à indicates that 
Since, for 14 and 484 d.f., the .05 level is 1.690, the F test indica 


370 PSYCHOMETRIKA 


the total score validity is significantly less than the configural validity, 
i.e., the total score does not yield optimal validity. 


Step 'I— Calculation of the multiple cut-off validity 


In order to rule out conclusively the multiple cut-off hypothesis, the 
zero-order correlation, Τε , was computed between the multiple cut-off score 
and the criterion. Multiple cut-off scoring demands that a score of one be 
assigned to the answer pattern with the highest (or lowest) mean and that a 
score of zero be assigned to all other answer patterns. Column M in Table 
3 gives the multiple cut-off score for each answer pattern. Substituting 
figures from Table 3 into the formula for squared correlation, 


۴ [500(861) — 18(13,936)р nm 

` [500(513,126) — (13,936)][5008) — 18] ~~?" 

Step 8—Comparison of the multiple cut-off v 
Applying (25), 

г (:612 — 060/500 — 16) — 

de ( 388 1 =a ) ni 


Since, for 14 and 484 d.f., the .05 confi 
shows that the multiple cut-off validity is sig 
validity. 


2 
Tme 


alidity with the configural validity 


dence level is 1.690, the F test 
nificantly less than the configural 


Discussion 

It has been shown how the conce 

to give an exact statistical test of w 
has optimal validity. Worked examples 
test scoring methods: multiple regressi 
In general, the principal advantage of 
information concerning the subject’ 
On the other hand, the princi 


on, multiple cut-off, and total score. 
configural analysis is that all of the 
8 test behavior is utilized. 


‹ pal disadvantages of configural analysis 
lie in the very [αοῦ that all the information is conserved; i.e., all possible 


answer patterns are considered. In a t-item test the formula for the configural 
validity involves 2‘ parameters; i.e., 2° answer pattern averages. It is im- 
mediately obvious that this technique is only appropriate for situations 
where the number of items is very small compared to the number of subjects— 
N must be much greater than 2'. For example, even when the number of 
items is as small as 10, 2‘ will be 1024. 

Use of the equav coefficients for scanning purposes introduces another 
difficulty. The F ratio test no longer gives the exact confidence level; it is 
simply a decision function. The procedure of Selecting the test scoring method 
which is most likely to yield optimal validity alters the significance level of 


μή й 


а а s 


SS 


H. G. OSBURN AND ARDIE LUBIN 371 


the F test (cf. [2], p. 199 ff.). As a way of deciding among several possible 
test seoring methods, the scanning technique is certainly a reasonable pro- 
cedure. However, it is advisable, after selecting a test scoring method on one 
sample, to cross-validate it on another sample. 

Configural analysis is most suitable in situations where testing time 
is short and the number of subjects is large. For example, take the case of 
neuropsychiatrie sereening in the armed forces where often only a few minutes 
of testing time is available, and a very large number of subjects must be 
screened. Here, items should be constructed in such a way that all 2‘ regression 
coefficients are significant. This will give maximum discrimination. However, 
in actual practice some of the regression coefficients will probably be non- 
significant. If this occurs, a value of zero should be given to all nonsignificant 
coefficients. The use of any other values will lower the validity of the test. 


REFERENCES 
and configural scoring. J. clin. Psychol., 1954, 10, 3-11. 
[2] Kendall, M. G. The advanced theory of statistics. Vol. II. London: Griffin, 1948. 
[3] Lubin, A. and Osburn, H. G. A theory of pattern analysis for the prediction of a 
quantitative criterion. Psychometrika, 1957, 22, 63-73. 


[1] Horst, P. Pattern analysis 


Manuscript received 12/7/56 р 
Revised manuscript received 4/18/57 


| 


—k 


PSYCHOMETRIKA—VOL, 22, NO, 4 
DECEMBER, 1957 


A MODEL FOR RESPONSE TENDENCY COMBINATION" 


Davrp BIRCH 


UNIVERSITY OF MICHIGAN 


A model is proposed to predict the performance on a compound stimulus 
as a function of the performance on the component stimuli in a two-choice 
situation. Data from a learning task are used to evaluate the model. 


Any theory of behavior which analyzes a stimulus complex into com- 
ponents and attempts to account for responses to the complex on the basis 
of response tendencies to the components faces the problem of specifying 
the rule for the combination of the component response tendencies. Theorists 
such as Hull [4], Thurstone [8], Gulliksen [3], Spence [7], Estes and Burke [2], 
Bush and Mosteller [1], and Restle [5] have incorporated combination rules 
within their theories and then made use of them in deriving implications 
from their theories. Seldom, however, has the combination rule itself been 
the focus of attention for these theorists. One recent instance is a study by 
Schoeffler [6] who carried out a test of a combination rule derived from the 
Estes-Burke learning theory. The rule is linked directly to the parameters 
of the theory and certain assumptions about the parameters are made by 


Schoeffler in bringing the rule to test. 

This paper presents the development of a model for combining response 
tendencies in a two-choice situation and reports a test for the fit of the model 
to data. The basis for the definition of the parameters of the model proposed, 
as well as the impetus for the development of the model, are derived from 
Hullian behavior theory. However, the combination rule, specified by the 
interrelationships of the parameters of the model, does not depend upon any 
partieular learning theory and, therefore, may be of value in a variety of 


situations where problems of combination arise. 


A Model for Response Tendency Combination 


5 i D, 
i š i tendency strength for stimulus e Я 
The difference in response 1 i euh n v 


„E, — „Е,), at any given point in time wi ; e 
ο Ὃ D, > ὦ, D, < —d,or —d < D, < d, where disa p us 
with a value such that Pr (u|D, > d) = 1, Pr w = к pee 
Pr {и —d < D, < d) = .5. Let Pr (D. 2 а}, Pr (D. < — han 


i mer Faculty Research 
k on the model was carried out under the Sume ien ш Pes enoaroh 


*The initial wor! 
Fellowship Program of the Horace H. Rackham School of стази ауен 
sity of Michigan. Acknowledgment is also due Mr. Richar 1 
in data collection. 


373 


374 PSYCHOMETRIKA 


D. < d} = 1 — Pr(D, > dj — Pr (D, < —d} be the probabilities that 
the difference in response tendency strengths is in each of the three states. 

It then follows that the compound probability of obtaining u and v 
when a is presented may be written as 


(1) Pr {ula} = Pr (D. > d) + (.5)[1 = Pr (D, > d) — Pr (D, < --ᾱ)] 


and 


(2) Pr {ula} = Pr (D. < --ᾱ] + C3(1 — Pr (D, > q) — Pr (D, € —4}]. 


A corresponding development for b gives 


(3) Pr {ulb} = Pr (D, > d] + (.5)[1 — Pr (D, > d} — Pr (D, € —d}] 
and 


(8) Pr full) = Pr (D, < --ᾱ] + (BI — Pr (D, > d] — Pr (D, < --ᾱ]]. 


Since Pr {ula} + Pr {vla} = 1 and Pr [u|b] + Pr {vb} = 1, there are avail- 
able two independent equations in the four unknowns, Pr (D, > d], 
Pr (D, € —d}, Pr [D, > d], and Pr [D, < —d}. 

The compound stimulus 


to with v and b with u, or both a an 
corresponding designations of (a, b) 


If the total probability of и to the presentation of (a, b) is denoted 
Pr {ul(a, b)], then 


Pr (ula, b)} = Pr ful(a, , b,)}-Pr ба Bad! ο 


Pr fas δν] + Pr ful(a, , b,)}-Pr fa, , bi} 
(8) + Pr (u|(a, , b] -Pr (a, ΠΗ 


where the entries on the right-hand side of the equation are the independent 
contributions from the four classes of (a, b). By writing each of these terms 
separately as a function of Pr (D, > d), 


А . d), Pr [D, < —d}, Pr (D, > d), and 
Pr (D, € —d}, a total of six experimentally independent equations in the 


four unknowns will be available so that the values of the unknowns are 
overdetermined. 
It follows from (1), (2), (3), and (4) that th 


° probabilities of occurrence 
of the four classes of (a, b) are 


— 


DAVID BIRCH ` 375 


Pr (a, , bu} = Pr {ula}-Pr щъ = Pr (D. > d}-Pr (D > d) 
+ Pr: {D, > d}-(.5)[1 Pr ID, > d) — Pr (D, € --ᾱ]] 
(6) + (.5)[1 = Pr (D, 2 d) — Pr (D. S —d}]-Pr (D, > d] 
+ (Bl — Pr (D. > d] — Pr (р. < —d}] 
(5) — Pr (D, 2 d) — Pr [D, < —d)]; 
Pr (a, , be} = Pr (ula) ‘Pr {lb} = Pr (D, > d] 
Pr (D, < —d} + Pr (D. > 4|-\.5)[1 — Pr (D, > d] 
(7) — Pr (D, € —d}] + (9Η = Pr (D, > d] 
— Pr (D, < —d}]-Pr (D, € --ᾱ] 
+ (ΘΠ — Pr (D, > d] — Pr (D. € -dj] 
«(51 — Pr (D, 2 d — Pr {δι < —@|]; 
—d|-Pr (D > d] 
> d} — Pr (Ον < -d)] 
< —d}]-Pr (D, > d) 
—4@|] 
—4}]; 


Pr (a, , bu} = Pr {vla} -Pr {ull} = Pr (D S 

+ Pr (D, < —q) (5) — Pr ID, 

(8) + (.5)[1 — Pr (D. > d] — Pr (D, 
+ (.8)[1 — Pr (D, 5 d) = Pr (D. 

«90 — Pr (D, 2 d) — Pr ID, 


IA IA | 


and 
Pr (a, , b,] = Pr (vla) ` Pr (ojo) = Pr (D. S 
+ Pr (D, € --ᾱλ (5 — Pr (D, 
(9) (ΘΠ — Pr (D, > d) = Pr (D. 
+ C5 — Pr (р. > d) — Pr (D. 
-(.5)[1 — Pr ID, Σ d) = Pr (D, 


u for each of the classes may be 
bu}, Pr {au be}, Pr fa, , ὃν], and Pr {a 5 0.) 
f и. The weights 


1 probability of occurrence ol 0. 
robability of “ is 1, given that the 
D, > d and D, > d, 


—d}-Pr (D. < —dj 
> d) -Pr {DS —d}] 
< —d}]:Pr {Τι < -dj 
< -dj 
< -dj]. 

The probability of obtained by weighting 
each component of Pr (a. , 


by an appropriate conditiona 


assumed are as follows: the conditional p 
combinations of response tendency states for a and b are Ye i: 
i-d < D: < d and D, 2 d; the conditional 


or D, > d and —@ < D, « d, o 

probability of u is 0 given D, € —d and D, € —d, or Da = —d and τὰ « 
D, < d, or —d < D. < dand D, < —4 and the conditional probability 
of u is .5 given D, 2 d and D, < —d, οἳ p, < —d and D, > d, or —d < 


376 PSYCHOMETRIKA 


TABLE 1 
Assumed Conditional Probability of Occurrence of u to the 
Compound Stimulus for the Possible Combinations of 


Response Tendency States of a and b. 


Response 


Tendency Response Tendency States for b 
States for a 
(D, > а) (-a< p, <a) (Dp < -ᾱ) 
(pa > à) 1 zi & 
(-a < p, < a) 1 .5 ο 
(p, < -a) 5 о о 


D, < d and --ᾱ < D, < d. These assumed values are presented in Table 1. 

These weights in conjunction with (6), (7), (8), and (9) produce four 
equations in the four unknowns Pr {Da > d}, Pr {Da € —d}, Pr (D, 2 d], 
and Pr (D, < --ᾱ]. Since the model under developme: i ec 
the problem of the prediction of performance to a compound stimulus as a 
function of the performance to the component stimuli, the relationships of 
(1), (2), (3), and (4) may be used to reduce (6), (7), (8), and (9) to functions 


of the two unknowns Pr (D, > d} and Pr [D, > d}. The resulting, simplified 
equations are : 


opment was instigated by 


Pr μα, 52] Pr (a, , ba) = CƏIP: (D, > d) -Pr {ul} 
(10) + Pr (D, > d)-Pr {ula} — Pr ΙΡ, > 4) «Ρε (D, > αἱ 
+ Pr [ula] -Pr [u|b]]; 
(11) Pr (u|(a, , b] Pr (a, , b.] = (.5)[Pr (D, > d]-Pr fold} 
= Pr [D, > d}-Pr {ula} + Pr [aja] -Pr (u|0]]; 
(12) Pr (ullas , 82] -Pr fa, , h.) = (-5)[Pr (D, > d} -Pr {vla} 


— Pr (D, > d) -Pr {ul} + Pr {ula} -Pr {ulb}]; 
and 


Pr {uffa , b2]-Pr fas, ba) = CDIP (p, > a 
(13) ‘Pr {ulb} — Pr {D, > d] -Pr {ula} + Pr (D, > d) 
“Pr (0, > d} + Pr fula} -Pr fuld]. 
Finally, (5) becomes: 
Pr [u|(a, b)} = Pr (D, > d)[C5) — Pr ыр) 


(14) 
+ Pr ID, > AICS) — Pr fula}] + 2 Pr (ula) -Priulb]. 


—— —á ————9 πω Í s 


=== O sS P 


ἨΕ‏ سے 


DAVID BIRCH 377 


It may also be noted from (10) and (13) that 

2 Pr [u|(a, , b.)}-Pr (a, , bu} — Pr {ula}-Pr {ujb} 
= Pr (D, > d]-Pr (u|b] + Pr (D, > d}-Pr {ula} 
— Pr (D, > d}-Pr (D, > d} = Pr {ula}-Pr (u|bl 
— 9 Pr {ul(a, , b)] Pr (a. δι], 

which indicates that it is necessary that 

Pr {ија} Pr (ulb] — Pr fullas , 1)} 
“Pr (a, , ba} = Pr [ul(a, , 5)] Pr (a. , bs} 


с n "S equations are to be consistent. The latter relationship provides 
"ice н test, of the model since all four of the values are experimentally 
: pen ent observables. If this relationship can be shown to hold within 
easonable limits, then (10) and (13) may be combined into 


Pr {ul(a, , b)] Pr (a, , ba} — Pr fullo, , 52] 


(15) “Pr fa, , by} = Pr (D, > d}-Pr [ub] + Pr {Ds > d) 


“Pr {ula} — Pr (D, > a] Pr {Ds > d}. 


A Test of the Model 


Ν Το obtain data for a test of the model, ἃ learning task was carried out 
in which subjects were required to associate the response Dac to each of 
ten letter pairs and ten number pairs, and the response Jix to each of another 
Set of ten letter pairs and ten number pairs. In dealing with the resulting 
data, it is convenient to define v as а correct response, C, and v as an incorrect 
response, 7. Also, stimulus a is defined as the set of twenty letter pairs, L, 


and stimulus b as the set of twenty number 

In constructing the letter pairs a total οἱ 
Q, S, Y, and Z) were used, and these were paire 
letter appeared in the first position twice, once t 
and once with Jir, and in the second position twice, again once to be associ- 


ated with Dac and once with Jis. No two letters were ever paired more 
treated in similar manner. 


than once. The ten digits (0 through 9) were 
The subjects, 145 male volunteers from the introductory psychology 
class, participated in groups of approximately 14. To provide an opportunity 
for learning, the letter pairs and number pairs were shown individually in a 
random sequence on flash cards for 10 seconds, with the correct response 
d no responses during 


exposed during the last 6 seconds. Subjects recorde 1 | 
did record their responses during the test series, 


the learning series but 
which was alternated with the le 


pairs, N. 
f ten letters (B, F, G, H, K, N, 
d in such a fashion that each 
o be associated with Dac 


arning series. 


378 PSYCHOMETRIKA 


Three learning and three test series were given. In the test series τν 
of the 90 letter pairs, the 20 number pairs, and 20 compounds were presente : 
individually in random order. Each compound was made up of a letter pair 
and a number pair chosen at random without replacement under the = 
striction that the same response be correct for both the letter pair and t 
number pair. Each test series employed a different pairing of the letters ни 
numbers in the compounds so that ће same compound never appeared more 
than once. Exposure time for each stimulus in the test series was five seconds, 
during which time the response was written. . 

Five experimental groups are distinguished on the basis of the relative 
amount of training offered on letters and numbers. For Group I of 30 sub- 
jects, two exposures of each letter pair and each number pair were provided 
in each learning series (L:N = 2:2); for Group II of 28 subjects (L:N = 2:1); 
for Group III of 33 subjects (L:N = 1:2); for Group IV of 26 subjects 
(L:N = 3:1) and for Group V of 28 subjects (L:N = 1:3), 

The measure of the probability of correct response to the letters, the 
numbers, and the four classes of the compound is obtained for each of the 15 
test series by pooling over stimuli and subjects. Table 2 contains these data. 


TABLE 2 


Obtained Probability of Correct Response for Letters Alone, Numbers Alone 
and the Four Classes of the Letter-Number Compounds 


τ- 
E (L:NE2:2) (ине) (ναι) == | ы 
Test TIR Sim = S |+ s š] 1 ¥ $ | 1 2 g 
* (el 38.5 ота рт .62 .62].55 .63 .67|.65 .76 „во |.51 .59 .5? 
P fcil :6 3 781.99 „ба 641.56 «τι kl эв ву го |.59 «68 «88 
[ш 19] Pr (Los Di 59.16 5T .24 .33 .33|.22 ду цв .30 ац 55 |.22 .38 "8 
Pr Í citro up Pr Éro, m 43 0 هة اف‎ 15 a&|.i 69 οἱ .23 .20 .17 |22 «03 +05 
Pr {olay νο} infu xj 33 A7 .45]|.13 лә 1 | 18 20 „17 |.o9 .от .o9|.22 .23 «90 
pe {а wh ED А! +08 102 «08113 eT" 266 оз бұ „аң οὐ „05 .o2|.o9 «οὐ «02 
ΠΠ πῇ 59 .15 abe .6h „61 Bo 


:59 „та .76 = .76 ЕДА .68 + 


A first test of the model comes from the relationship Pr {C|L}-Pr {CIN} 
= Pr {C|(Le , No)} -Pr {Le Να] = Pr {CI(L, Np]: Pr (L, , N,} derived 
from (10) and (13). The differences between the values for the left side and 
the right side of the equations for the 15 Observations have a mean of —.004, 
a range of —.03 to .05, and a root, mean square deviation around 0 of .02. 
It would appear that the fit is sufficiently good so that (11), (12), and (15) 
may be used in a further test of the model. 


DAVID BIRCH 379 


TABLE 3 


Derived Values for Pf = a and rf n, st and Predicted Probability of 


Correct Response for the Four Classes of the Letter-Number Compounds 


„{ “C — —M —À 
— n  w——Fw 


SI 
тї тїї τν 
(L:N=2:1) (L:N=1:2) (L:N73:1) 


Group I 
(L:N 22:2) 

Test 1 2 3 1 2 3 1 2 3 X 9 

Prf p 2 a 27 «3 .55 |.25 .37 .33|.08 «30 .%ю].й1 .58 
mío 4 .27 „57 «665 |.2т .37 .31 |.22 .53 .62|.13 «30 
τοΐε[ας, 13] rr Íte uj 8 „ат .56 |.26 .35 .35 |.23 .42 .48|.38 .!5 
ii ζεις, ху} gens xj as oai ijas3 15 .16].11 .10 .09|.23 .23 
Prf eltt ued} er fry, nd [as ав ae as as as ав 22 ого 0) £05 


Pr сиы, πρ} arte, uj .05 .02 .10].04 .03 M E 


er fon, u} .61 .78 .84|.57 .68 .7 


are overdetermined, and since 


With three equations, the two unknowns 
air of equations for 


there is no a priori reason for selecting any particular p 
solution, it was decided to obtain all three solutions and use the means as 
the best estimates of Pr (D; > d} and Pr {Dy 2 d). Accordingly, the appro- 
priate empirical values for each of the three tests for each of the five groups 
were substituted into the equations and solutions for Pr (D, > d] and 
Pr (Dy 2 d] obtained. The resulting mean values are shown in Table 3. 
Το obtain an indication of the consistency of the three equations, the standard 
deviation of the three estimates of Pr {Dz > d) and Pr {Dy 2 d} for each 
of the 15 determinations was computed. These values ranged from .01 to 
19 and yielded a mean and median of .09 and .09, respectively, for 
Pr {Dz > d]. The range for Pr {Dy 2 d) was .01 to .19 with a mean and 
median of .10 and .12, respectively. 

The predicted values for Pr (C] (Lc , No) Pr {Le Νο], Pr (10, Nd} 
Pr {Le , Nr}, Pr {C|(Zr, Νο)] Pr (La Ne}, Pr (C\(Lr, Np] Pr (Zr, №1), 
and Pr {C|Z, I} using mean Pr (D; > d) and mean Pr (Dx 2 а} are con- 


tained in Table 3. A comparison of the obtained values of Table 2 with the 
fit except for a small 


predicted values of Table 3 shows a quite satisfactory 

consistent tendency for the predieted values of Pr (Cl(Dc , Νο)" 
Pr {Le , Ne}, Pr {Cl(Ze , Nn): Pr {Le , Ni), and Pr {C\(Lr , Νο): 
Pr (L, , Nc] to be too high and the predicted values of Pr (СІС; ‚ Np]: 
Pr {Lr , Nr} to be too ]ow. This diserepaney 15 refleeted in mean s 
of .007, .012, .013, and —.013 between predieted and obtained yalues or 
these classes of the compound stimuli. The root mean square deviations of the 


980 


PSYCHOMETRIKA 


differences around 0 for the same four classes of the compound stimuli and 
for the total, Pr (C|L, N], yielded values of .021, .016, .018, .028, and .031, 
indicating further the adequacy of the fit of the model to these data. 


Ш 
[2] 
[8] 
[4] 
[5] 
[6] 
17] 


[δι 


REFERENCES 


Bush, R. R. and Mosteller, F. À mathematical model for simple learning. Psychol. 
Rev., 1951, 58, 313-323. 

Estes, W. K. and Burke, C. J. A theory of stimulus variability in learning. Psychol. 
Rev., 1953, 60, 276-286. 


Gulliksen, H. A generalization of Thurstone's learning function. Psychometrika, 1953, 
18, 297-307. 

Hull, C. L. Principles of behavior. New York: Appleton- 
Restle, F. A theory of discrimination learning. Psychol. 
Schoeffler, M. O. Probability of response to compou 


Century-Crofts, 1943. 
Rev., 1955, 62, 11-19. 


nds of discriminated stimuli. 
J. ezp. Psychol., 1954, 48, 323-329. 
Spence, K. W. The nature of discrimination learning in animals. Psychol. Rev., 1936, 
43, 427-449. 


Thurstone, L. L. The learning function. J. gen. Psychol., 1930, 3, 469-493. 


Manuscript received 12/10/56 


Revised manuscript received 2/20/57 


PSYCHOMETRIKA—VOL. 22, Νο. 4 
DECEMBER, 1957 


PROCEDURES FOR OBTAINING SEPARATE SET AND 
CONTENT COMPONENTS OF À TEST SCORE* 


GERALD C. HELMSTADTER 


COLORADO STATE UNIVERSITY] 


two distinct models, several formulas for obtaining separate set 


ts of a test score have been derived. Comparisons among 


Using 
h their application to a set of 


and content componen 
the methods are made algebraically and throug! 


test data apparently affected by response sets. 


Cronbach [1] has pointed out that when a test is composed of difficult 
items having but two or three alternatives, a response set is likely to affect 
the total test score. By response set is meant “any tendency causing a person 
consistently to give different responses to test items than he would when 


the same content is presented in different form." Cronbach has further 
etest stability, and he feels that there 


shown [2] that such an effect shows test-r 

is adequate evidence to conclude that various response sets reflect “real” 
dimensions of human differences. As he further points out, some of the 
response-set variance is potentially useful while some of it will interfere 
with measurement. To be able to capitalize on the effect of response set when 
it is useful and to eliminate it when it is undesirable, some procedure for 
obtaining separate set and content components of a test score is necessary. 
This paper presents a logical basis and compares ἃ number of seoring pro- 
cedures for separating the response set from the score reflecting individual 
differences with respect to the obvious item content. Because гасен 
most commonly occur іп tests composed of items which ποῦ ш 
natives, the following discussion is restricted to M case. Also, 

venience, it will be assumed that no items are ж — S 


impli i i i f notatio 
Το simplify the discussion à single set o \ i 1 
For convenience, these are listed together below. F urther explanation of the 


terms will be made as each is used. 
Çorman Frederiksen, 


hes to thank Dr. 
k d Frederic M. 
noke aliy to this paper throug! 


n Research at the Edu- 


who brought this problem 


*Th thor wis Lord, who suggested three 
to his sion, and Drs. Ledyard RT! i h many helpful 
]l have contributed 


of the procedures. Al Жз 
eee the author was an Associate 1 


suggestions and cri . и 
‘This paper was written while 
amima Testing Service, Princeton, New Jersey. 


381 


382 PSYCHOMETRIKA 


N. and Ng = number of items keyed A and B, respectively.* 
K, and Kg = number of items keyed A and B, respectively, which have been answered 
correctly on the basis of content. 
P4 and Рв = the examinee's probability of marking an item not answered on the basis 
of content A and B, respectively, 
Ra = number of items keyed A and marked A by the examinee. 
Rg = number of items keyed B and marked B by the examinee. 
Wa = number of items keyed B but marked A by the examinee, 


I 


subscript as follows: α = keyed 4:β = keyed B; a; = 


Жаа у Aga; the proportion of examinees Who marked A for an item classified by the 
= = marked À by 


person z; b; = marked B by person 7, 
C; — content score by procedure j. 


j — set score by procedure j. (S; will always mean a set to mark response A.) 


The Scoring Procedures 
Usual, Total Score Procedure 


Ordinarily, a test is scored by simply countin 
agreement with a key, Thus, in the present ter; 
procedure can be expressed as 


(1) C =R. +R. 


e number of items the ked 
A and the number he marked B. Thus, let '''” 
(9) 8 = Ra + W. – (Ra EDS Wa). 


Warranted Set Procedure 


response illustrated in Figure 1. One could think 

matrix as follows: 

Ёл and Rs = examinee's warranted set 
tively; 

W. and Ws = examinee's unwarranted 86 
tively; 


to mark À and to mark B, respec- 


t to mark 4 and to mark B, respec- 


; i i ipts 
indicate the way an item was Keyed, while the English subscripts the Greek Subscrip 
item was marked by the individua 


ντ συ ωμµ νι 
Lale 


GERALD C. HELMSTADTER 383 


Examinee's Response 


Keyed 
Response 
Total 
Fig. |. Observed Score Matrix 
Ra + wa] 
and = examinee's total set to mark 4 and to mark B, respectively. 
ἣν + Wa) 


Then, if any indeterminate ratios are put equal to zero, a content score 
could be defined as 

warranted set to mark A 

total set to mark A 
Ra Rs. 

8 = p Wi T R, + Wa 

o define a set scores as 

unwarranted set to mark B 

total set to mark B 


warranted set to mark B 
total set to mark B 


С, = 


Similarly, it would be possible t 
unwarranted set to mark LN 


8, = 
total set to mark 4 
7 Ws 
ta eee 
ы TH И Rot We 


Postulated Knowledge Procedure 
A third way which might be used to score à test which is subject to 
the effects of a response set involves an estimation of the xp ed 
actually based on content. Following а logic am p o t τ ds 
in deriving scores which “correct for guessing, the number of items observe 
d Rs) can be thought of as 


š 3 i.e., Ra an 
as marked in agreement with the key (i.e., Жа an’ ΤΙ : deor 
resulting from a summation of those items marked pen s αι Y: 
the basis of content and of those items marked in this 


the basis of set. If Ka and Ks represent the ας 3 Pe aod ager 
marked A and B, respectively, on the basis of con res ectively. which 
М» — Ку represent the number of a-items and KRE T sii еи 
have been answered ΟΠ some othe κας m 


V bug B, respectively, on а 
: n 
probability that an individual marks an item 4 OT 2, р ? 


984 PSYCHOMETRIKA 


basis other than content, the number of a-items and the number of B-items 
marked in agreement with the key on a basis other than content will be 
| P,(N, — K.) and P;(Ns — Ks), respectively. Thus, it can be said that 


(5) R, = К. + PAN, i a е) 

апа 

(6) Rp = К, + P(N; — Kj). 

If no omits are permitted, the examinee must mark either A or B and thus 
(7) Pat Ps = 1. 


A final equation necessary to obtain values for K, , Ke, P, , and P, can 


be obtained by assuming that the a-items are equal in difficulty to the 8-items. 
This assumption can be expressed as 


Ka Κι 

(8) N. Na 

The simultaneous solution of these last four equations yields 
Na 

(9) Ka = n (xe) — Mx. 
N 

(10) K, = r. -W,, 
Naw 

11 =a 

an Fa NWe + NW, 

(12) P N Wa 


= e o. 
ΝΕ T Ναι 
Given these values, the obvious content score is 


(13) C, = K, + Kg 
14 = n Ne Να 
(14) Bro FRAN) = (W. + Wa). 
While P, itself could serve аз a measure of set toward A, it is more con- 
venient to let 
(15) 8, = 2P, — 1. 


Here Ss ranges from —1 to +1 and is 0 when the examinee is just as likely 
to mark an unknown item A as he is to mark it B. Thus, 
8 2N.W, 
EU et U U 
αθ ° NW NWT L 


3 —má S. 
- = на t . 

-— q == nnmra 

a — n —————‏ ا 


GERALD C. HELMSTADTER 385 


Orthogonal Score Procedure 


Another possibility is to consider the set and content scores as orthogonal 
traits. In this conception of the problem, each examinee is represented by a 
point in a plot of the proportion of items keyed A which were marked A 
(i.e., &4/N a) against the proportion of items keyed B which were marked A. 
If this is done, the vector going from (0, 0) to (1, 1) ean be considered a set 
axis and the vector going from (0, 1) to (1, 0) a content axis. The set and 
content scores of the examinee can then be defined as some function of the 
projection of his plotted point on these respective axes, e.g., that given in 
Figure 2. 


e's marked A 
О | 


Fig. 2 Set and Content as Orthogonal Traits. 


The scores can readily be obtained in terms of the observed measures 
by applying a 45? rotation and making an appropriate translation of the axes. 
For convenience, translations which make all content scores positive and 
which make set scores equal to zero when at a chance level (ї.е., when 
P, = P, = 3) have been used. Thus, by this procedure: 


s W. ' 
(17) С, = ποτε. = Na + 1); 


356 PSYCHOMETRIKA 


© δι |. 
(18) 8, = ποτ. МЫ 1 


Postulated Scale Score 


All of the procedures thus far discussed have assumed that the items 
in the test were dichotomous with respect to content. That is, the items 
have been considered to represent either A or not A. The solutions to the 
problem of obtaining separate set and content scores presented in this and 
the following sections make a different assumption: that the extent to which 
items represent 4 can be expressed as a continuous variable. Thus, the test 
items are visualized as falling along a unidimensional scale characterized 
by the content. For example, in an inventory designed to detect an authori- 
tarian personality, it might be preferable to think of statements (with which 
a respondent is asked to indicate his agreement or disagreement) as repre- 


senting various degrees of authoritarianism rather than as being classified 
as authoritarian and nonauthoritarian. 


One possibility under this view is to postulate, for each individual, a 
characteristic curve relating the probabili 


3 ty of his marking an item of a 
given scale value as A to the scale value of the item. Two such curves are 
illustrated in Figure 3. 


values and thus would be represented 
by the slope of the curve. On the o 
tendency to call all items A or all ite 


Probability of marking 
1.0] an item A 


0.5 


Tendency of item 
to represent oç 


Fig.3. Item Characteristic Curve for 
Two Individuals. 


| 
| 
x 


I 


GERALD C. HELMSTADTER 387 


by some index of central tendency which would locate the position of the 
curve on the scale. The scale value corresponding to probability 1/2 of 
marking an item A could be used for this purpose. 

The problem, then, is to determine the important parameters of this 
characteristic curve from observations of one or zero (i.e., calling an item A 
or not calling an item A) for each person for each item. While theoretically it 
would be possible to obtain the maximum likelihood estimators of the desired 
parameters, preliminary work, specifying first the normal ogive and then a 
straight line as the form of the characteristic curve, indicated that the solu- 
tions are far too complex to be of practical value. 

One approximation which is feasible, however, is the following. Assume 
that all the items fall at only two points which differ along the scale charac- 
terized by A. Those items keyed «4 could be considered as estimates of one 
point, and those keyed B of the second. Then the scale value of each of these 
points could be obtained by averaging, within each group of items, the normal 
deviates corresponding to the proportion of persons marking the item 4, 
that is by taking 


(19) δ. = HUE Zeus Ead 

and 

(20) 2, = AD Zam, F Σ Zand, 
B а bi 


where Z is the normal deviate corresponding to the indicated proportions. 
Then, the normal deviates for the proportion of A items and the proportion 
of B items which each individual marked A can be plotted at these points 
as indicated in Figure 4. The slope of the line determined by these two points 
can now be used as a content score, and the height of the line at its midpoint 
can be used as a set score. Thus, 


DRANG E Zim x 
= Аба + ۾‎ 
(21) Gs SF 
and 
(22) δε = AATA sP Zara) 


Correlation Procedure 


An alternative procedure for obtaining a content score, assuming the 
items can be scaled, is to compute the biserial correlation between the exam- 
inees’ dichotomized responses and scale values of the items along a ш 
continuum. Both Tucker [6] and Lord [4] have indicated that this ος 
correlation is a simple function of the slope of the characteristic d. id 
ever the scale values of the items have a normal distribution for the particular 


388 PSYCHOMETRIKA 


Normal deviate for 
proportion of items an 
individual marked A. 


+5 


Zj ΒΑ/Ν 


«| 


Tendency (in normal 
deviates) of item 
groups to represent Οἵ. 


20 9 αἱ 


Fig. 4. An Approximation to the 
Item Characteristic Curve. 


test under consideration. Thus, the correlation procedure provides another 
means of estimating the slope of the characteristic curve shown in Figure 3. 
Also, it seems quite reasonable to consider an examinee who can successfully 
rank all the items in the test according to their relative position along the 
content continuum as having the ability to make very good discriminations 
with respect to the content, regardless of where he would locate the items 
as a group along the axis. 
The formula for biserial correlation is 


(23) т = (M, — Mjpg/Ze, . 


: the proportion of examinees (in a standard group) that marked the item A 
πω μονο vi the item, the elements for the biserial correlation 
formula can be expressed in terms of the notation used here as follows: 


= ВАКа. ΕΞ, 
Να + Np’ = Na +N) 


Edad 14s, Уа ΣΑ, 
ba SM Nis ai m 
1 = B 


Rat N= Ry CHURSOYNTR, ° 


Μι = 


gy = Taj ts = σα; 2 = normal deviate corresponding to p. 
There seems to be no direct suggestion for a set score ως 
procedure. 


GERALD C. HELMSTADTER 389 


Comparison of the Methods 


When severalapproaches are proposed for the solution of a single problem, 
it is imperative that some attempt be made to determine the extent to 
which the various solutions produce similar results. Thus, all of the formulas 
have been compared and those which were not algebraically identical nor 
linearly equivalent used to obtain separate set and content scores for 62 
individuals who had been given an experimental test designed to measure 
one aspect of report-writing ability. 

First consider the content с? writing them solely in terms of the 
readily obtained quantities Ry , Rs , Να, and №. 


(24) σι -- Ra + Ra 
5 - — ἂν 2 — TEN . 

09 Geyik RA" N. = (В, RO ' 

(0 _ O,=R(N/N + ΒΝΝ./Νὴ — (Np = Ra + Na — RÀ 


(27) = (N. + Np[(R,/N.) + (В/М) — 1] 
(28) Οι = .707{(Ra/Na) — Ns — Rx/N,] + 1] 
(29) = .707[(Ra/Na) — (R,/Np)] 
(30) - XE C, + .707; 
(31) @ = Fase + 
If Z, — Z, be the unit of the scale, noting that Z,-,  --ᾱ», 
(32) €, = Яалт. + πον, 
[Na - (Ra — ВАУ Aaa, + 22 А] 
Co = Za (N. + Np) 
“= [Np + (R, — ВС А + 2 49d 
~ ZoNa + No)" 


x2 АФ 22 Απὴ -- 00У Aas + > Ap.) 
(34) n وع‎ 


Zo (N. + №) 


— Ra) Na + (Ra — R»), 


Να + (Ra — Μα) 
p Να + Ns 


and 4= N, +N 


PSYCHOMETRIKA 
390 


Expressed in these terms it is readily apparent that C; and C, will oe 
examinees in the same order and are linear functions of (Ra /N.) + E t 
which, for convenience, will be called the simplified ratio score and е, е- 
signated by C; . Also, it is interesting to note that when №, = Ng, C; is 
perfectly correlated with the number correct. 


Next, consider the set scores, again writing each in terms of the quantities 
RA , ΒΕ», Na, and Ng: 


(35) S = R, + Νρ-- Ra — R, — N. + Р, 
(86) = N, — N, + 206, — Rp; 
Ny — R, Ne= Ra 

θη dn R, de Ne— Ë, Rad N.— Ry 

E Ra R, Р 
әз) We RIZED N. T m. RE! 

н 2N «(Ns — В) " 
pr 5^ NaN. ED FNN RS! 
(40) --(,/Ν.) = QN) . 

2 — [(R,/N. + (05/N9] ° 

(41) S, = .707{[(Ra/N.) + (N, — 14)/ ΝΕ] — 1} 
(42) = .707((Ra/N.) — (R,/Np]; 
(43) S, = (ζην. + Disa A/a) 
(44) 


= S(Zn v, = Zrvixp)- 


In this instance, no two procedures will produce identical results insofar 
as the ranking of the examinees is concerned. Since, however, 8, will rank 
individuals the same as (R,/N,) — (R,/N.), this procedure will hereafter 
be designated 8, to indicate that it is the companion of the simplified ratio 
score, C; . 

The examination used in the empirical comparison of the methods was 
gned to measure a person's ability to recognize when one expression could 
be substituted for another without alterin 


g the meaning of the statement. 
Each item consisted of a short statement 


containing a word or expression 
which had been underlined and an alternative expression in parenthesis at 


the end of the sentence. The task was to indicate whether or not the alter- 
native expression could be substituted for the original one without influencing 
the possible consequences should some policy decision, administrative action, 
or legal claim hinge upon the interpretation of the statement. In scoring the 
test, 70 such statements were used. Forty-five items were keyed same (і.е. 


i» 


———. ος 
X —q 
— — ی‎ — n T > 


GERALD C. HELMSTADTER 391 


the consequences would hi i i 
— η mir едо matter which alternative expression 

In an experimental tryout, the test was administered to 62 students in a 
graduate school of journalism. When the results were analyzed, it was noted 
that while the corrected odd-even reliability of the total score was only .47 
similar reliabilities, obtained by scoring separately those items keyed “вате” 
апа those keyed “different,” were .79 and .72, respectively. This fact, em- 
phasised by the resulting correlation of —.54 between scores obtained on the 

same items and those obtained on the "different" items, led to the con- 

clusion that a response set was affecting the results. 

Because these results suggested a real difference among individuals on 
a variable other than that measured by the total score, it was felt that the 
data would be appropriate for a comparison of the various procedures sug- 
gested for obtaining separate content and set components of a test score. 
Consequently С, and S, (total score and simple difference), C, and S; (content 
and set by the warranted.set procedure), S; (set by the postulated knowledge 
Procedure), C, and S; (content and set by the postulated scale score procedure) 
Cs (content by correlation procedure), C; and 5; (content and set by the 
simplified ratio procedure) were all computed from the data and the inter- 
correlations obtained. The results are presented in Table 1. The values 
below the diagonal represent the median value of the correlations in the 


respective block. 


Discussion and Conclusion 


It will be recalled that two fundamentally different models have been 
used in the derivation of the various indices. One model assumes that there 
are right and wrong answers to the items and that the degree of set can be 
determined from. the answers which disagree with a key. 'The total score, 
the warranted set, the simplified ratio and the postulated knowledge pro- 
cedures are definitely of this type. The other model, of which the prototype 
is the correlation procedure, assumes instead that the items can be scaled 
along a continuum. The postulated scale score, while based оп the latter 
model, requires the use of a key to obtain a practical solution, and thus 
mi nsidered a compromise. 

Pis dor of both a [eid of the formulas and the results of the 
empirical illustration presented, ting to conclude that in ае 
instances it will make little difference which method is used. This is paris. 
larly true with respect to measures of set where the κος a 
the methods vay from το ig they wil 
the danger of assuming that just because two ke Р 
һауе the same external validity. He shows, for шш ү К ү эм 1 
Which correlate .94 with one another, when one correlates . 


it is temp 


PSYCHOMETRIKA 
392 
TABIE I 


. 
Intercorrelation Among Various Set and Content Components of a Test Score 


N = 62 


"For 60 degrees of freedom, the 5% and 1% 


values of r are +250 and .325 


respectively, 


criterion the other might correlate anywhere between .29 and .84 with that 


same criterion. It is extremely important, therefore, to make use, if it is 
at all feasible, of an external criterion in selecting a method for obtaining 
separate set and content components of a test Score. 

When no external criterion is available 
important. Thus, for example, it might be n 
the simplified ratio procedure has the highest first centroi 
matrix of set scores, has the next to the highest, loading in the matrix of 
content scores, and has an average correlation between content and set 
scores which is closer to zero than that for any other method. That this 
occurred in a case which would clearly be more Appropriate for the continuity 
model is particularly encouraging since use of the continuity model requires 


——————— 
И 


GERALD С. HELMSTADTER 393 


considerable extra work in the scaling of the items and in the computation 
of the scores. 

Obviously, such evidence as that presented here does not conclusively 
establish. the efficacy of the content and set scores described in this paper. 
To do this it would be necessary to carry out studies which indicated whether 
or not the set can be independently manipulated through the use of various 
experimental controls. For example, one might design an experiment which 
would involve the administration of a test such as that of alternative expres- 
sions under two or more conditions that differ in the extent to which set 
is likely to occur. Or, one might try writing tests whieh would yield the 
same content scores but different set scores when different item forms 
were used. Beyond this, the usefulness of such scores would have to be 
established by the usual reliability and validity studies in the context of 
a particular applied situation. 

This evidence does suggest, however, that one or more of the procedures 
developed here might be useful in a number of situations. In the absence 
of an external criterion against which to compare the methods and without 
further experimental evidence, present indications are that the simplified 
ratio procedure will provide an adequate approximation to set and content 
components of a test score except when both extreme accuracy is needed and 
when, in addition, the continuity model is obviously the most appropriate. 
In this latter case, use of the correlation procedure to obtain content scores 
would appear to be justified. 


REFERENCES 


[1] Cronbach, L. J. Responses sets and test validity. Educ. psychol. Measmt, 1946, 6, 


475-494. 

Cronbach, L. J. Further evidence on response sets and test design. Educ. psychol. 

Measmt, 1950, 19, 3-31. 

[3] Gage, N. L. and Cronbach, L. J. Conceptual and methodological problems in inter- 
personal perception. Psychol. Rev., 1955, 62, 401-422. 

[4] Lord, F. M. A theory of test scores. Psychometric Monogr. No. ?, 1952. Chicago: Univ. 


of Chicago Press. 
[5] MeCornack, R. L. A criticism of studies comparing item-weighting methods. J. appl. 


Psychol., 1956, 40, 343-344. 
[6] Tucker, L. R. Maximum validity οἱ 
1946, 11, 1-13. 


[2 


& 


f a test with equivalent items. Psychometrika, 


Manuscript received 11/29/66 
Revised manuscript received 4 /11/67 


PSYCHOMETRIKA—VOL. 22, No. 4 
DECEMBER, 1957 


ITEM SELECTION METHODS FOR INCREASING TEST 
HOMOGENEITY 


HAROLD WEBSTER 


VASSAR COLLEGE 


„A number of methods for inereasing test homogeneity by item selection 
are discussed. Exact selection conditions which will maximize obtained homo- 
geneity as measured by KR - 20 and KR - 21 are derived, and an application 
is given. Since they require only item count data, the selection cull oes are 
economical to apply. 


A problem which is likely to arise whenever psychological tests are 
constructed is that of adding or discarding items in such a way that the 
resulting test will have some optimum degree of homogeneity or “split-test’”” 
reliability. Adkins [1] and Davis [4] have reviewed various practical solutions 
in general use. A popular method consists in retaining items with high item- 
test correlations and discarding those with low correlations, but this is not 
the best way to increase obtained homogeneity as measured by the reliability 
coefficients in common use, KR-20 and KR-21, originally derived by Kuder 
and Richardson [10]. The purpose of the present paper is to present exact 
and economical item selection methods for increasing Kuder-Richardson 
reliability. 

There has been some debate concerning the adequacy of KR-type 
coefficients as measures of test homogeneity [11, 9]. In this paper homogeneity 
will be defined either as IKR-20 or else as a rather general coefficient due to 
Lord [13], which is formally the same as KR-21; which coefficient is better 
to use in the item selection conditions probably depends, to judge from 
applications, on the sampling theory presented by Lord [14]. No attempt 
will be made to improve these definitions of homogeneity even though 
inadequacies are recognized, some of which are mentioned in the next para- 
graph. 
Although the item selection conditions which will be derived could be 
used to maximize homogeneity, there are important reasons why maximizing 
homogeneity for a given sample will usually be impractical. First, for a given 
sample small increases in homogeneity, especially in its upper Tange, are 
likely to be lost in subsequent samples because of unknown sampling varia- 
tions. Second, although cases where homogeneity is impractically high are 
seldom encountered, it is known (when items are assigned point scores) 
that tests with homogeneity approaching unity will have undesirable item 
redundancy [7, 16]. Finally, when using items for which the direction of 


395 


996 PSYCHOMETRIKA 


scoring is fixed, increasing homogeneity beyond a certain value tends rapidly 
to increase the proportion of items retained which have extreme means, 


on. For these reasons, 


selection the observed homogeneity of a single test, 
Gulliksen ([S], p. 379) has suggested a graphical method for selecting 
ion i increase homogeneity as measured 


derived in the present paper which 
use KR-20 or KR-21, which allow re-examination later of previously rejected 


items to see if they should be put back in the test, and which do not require 
plotting. 


In this and the next section sever: 
test homogeneity, including the popul 
will be considered. Methods which 


and KR-21. 
The usual definition of reliability is 


(1) "тт = (Vr — EE;)/V, = 1 — (Br/ V1), 
where Vr is the total variance and Br is the error 
KR-20 or KR-21, which are used to define homo: 
directly from (1), depending upon how Z, is defin 
(or discarding) items in such a way that th 
there are other ways in which rrr might b 


variance of test T. Either 
geneity, may be obtained 
ed. In addition to selecting 
e entire ratio (1) is increased, 
e increased: by selecting those 


——_— στ. ή... 


HAROLD WEBSTER 897 


pm which (2) increase the true variance V; — E; , or (ii) decrease E; , or 
: 1 | orare Vr. Conditions derived to achieve any one of these aims alone 
н ned o have serious limitations when used in iterations for the purpose 
unt ан тт. Гог example, it was found in several applications that no 
on : em т ік discarded or added would decrease E; . Also, 
S = | 2 im inefficient; because of the relative stability of E; , 
l appears to have much the same disad m i 
efficiency of which will next be shown. шул ыз: ا‎ 
The variance of test T' minus item j may be written 


(2) Fra = Ve + V; — 20б, 


- x” : Р 

τ = б» - 2 covariance of item j and test T. If item j satisfies the con- 

lr 3 : 7-i و‎ then it could be discarded to increase the test variance, 
8 1s seen, using (2), to be the same as discarding j if 


8 Cir < V,/2. 


Dividing both sides of (3) b iati 
oe . th š | 
чу ) by the product of standard deviations, S;Sr , 


(4) тт € S;/2Sr « 


By a similar development adding an item k not already in 7 will increase 
the variance if 


(5) nr —8/28;. 


N ow item selection based on alternate applications of (4) and (5) will always 
increase test variance. The quantities on the right in (4) and (5) are, however, 
quite small, and as the standard deviation S, increases, these expressions 
approach even more closely the condition that items be discarded or added 
merely if their test correlations are, respectively, negative or positive. But 
the latter condition is known to be an inefficient method for increasing 
can be seen that one effect of applying (4) and (5), if 
will be to form a very long test. It has also been 


shown by other methods that if a test is long enough, practically all items 
with item-test correlations exceeding zero will, if added to the test, contribute 
to its homogeneity [3, 17]. Items with low correlations may contribute very 
little, however, and efficiency in testing requires that the shortest tests 

achieve a specified homogeneity be used. Bedell [2] derived equations 
er of items with lowest item-test correla- 
imize the reliability of a single-factor test. 
utational approxi- 


homogeneity, for it 
enough items are available, 


which 
which could be solved for the numb 


tions to be discarded in order to max 
Unit rank for the item matrix was assumed, and some comp 


mations were developed. 


398 PSYCHOMETRIKA 


A popular method is to discard item j from test T if 
(6) rir < k, 
where Ё is a positive constant. The requirement that retained items be signifi- 
cantly correlated with their own test is approximately satisfied if à number 
of items satisfying (6) are discarded at once when k is some multiple of the 
standard error of л; . But if moderately large samples of subjects and items 
are available, and k is chosen to correspond to one of the usual levels of 
significance, then the obtained homogeneity is usually found to be decreased 
after applying (6). 

Another method which is independent of test le 
cussed briefly. If item j satisfies the inequality 
(7) Ттт < Üow. p qr.p у 
where the r’s are homogeneity coefficients for test 7 and for test T minus 
item j, respectively, then discarding j will increase homogeneity. A similar 
expression can be written for the case where adding an item to T' increases 
homogeneity. But whether or not j satisfies (7), or the corresponding addition 
condition, could depend, especially for short tests, on the length of T. It 
might therefore be argued that if items were to remain in 7 on their own 
merits, (7) should be rewritten so that it is independent of the length of T. 
It can be shown, however, that this is not advantageous, for it leads to 
expressions the applieation of which will eventually decrease homogeneity. 
To show this, first multiply the left side of (7) by (n — 1)/(n — rrr). This 
is the change required to make (7) independent of test length, a fact which 
can be proved by next rearranging the terms to correspond to functions 
which are known to be invariant with respect to test length ([8], p. 85). 
Finally members of this rearranged inequality can be shown (by adding 1 
to both sides and taking reciprocals) to have exactly the same effect in item 
selection as if they were estimates of the Squared average item-test correlation. 
Therefore if (7) is directly altered to make it independent of test length, its 
application will always increase the average item-test correlation, but will 


not always increase homogeneity, In fact if it is used in iterations, it will 
д ч Ἶ 
reject successive halves (a 


ve (a of the items in the test, and the 
analogous addition cond vi i k into the test later any 
items previously rejected; consequently rpp decreases sharply after the 
test is shortened beyond a certain point. 


ngth will next be dis- 


Item Selection Conditions Which Are Dependent on Test Length 


as the condition for discarding item j in 
. As in ( 1), the homogencity of the shortened 


We return therefore to (7) 
order to increase test homogeneity 
test is 


(8) Fag = 1 = (Er_;/Vy_)). 


HAROLD WEBSTER 399 


Lord [13, 14] has shown that if Er is defined as the mean of the estimated 
| sampling variances (based on random samples of n items) of the N subjects’ 

test scores T, , then 

(9) Er = (nT — ΤΊ. (η — 1), 

where T is the sample mean. Substitution of (9) in (1) provides a measure of 

reliability whieh is formally identieal with KR-21 but which is actually 

more general than either the latter or KR-20. The error variance of the short- 

ened test can be written, as in (9), 


Er- = [n — I(T — р) — (T — р)? — Ve. — 2) 
= T — T° — Y, — T — pin — 27) + 20ιε]/(υ — 2), 


= (10) 


where p; is the mean of item j. 

Substituting (1), (8), (2), (9), and (10) in (7) and simplifying, it is found 
that item j may be discarded to increase homogeneity (as measured either 
by Lord's formula or by KR-21) if 
(11) Cyr — k, V; — kp; € к, 
where the constants in (11) are 

ky = (n — Ja? — T — V3/2ln — 20 — T°) + Vr), 
ka = (n — Dw — 27) V;/2(n — aT — T) + Ут], 
k = VP? — P У) — 207 — T) + Уз]. 

Suppose that item Ë is not in test T. By a derivation analogous to the 

above, item k may be added to T' to increase homogeneity if 


@ (2) Crp K,V, — Kip: » Κι, 
where the constants are 

Κι = n(nT — P — VAT — nT” — Үл), 

К, = (n — Dn — 2D) V 1/20 T — nT? — Vr), 

К, = V (T — T + V3/268T — nT? — Τη. 


the item mean p remains explicit in (11) and (12). 
20. One can discard j to 


It is interesting that 
This is not so for conditions derived using KR- 
mogeneity as measured by KR-20 if 


n σε Xx]. aci Ver eret], 
(18) n— 1 Vr n—2 τι 
can be simplified to 

Cj; — h V; < hi; 


increase ho 


which, using (2); 


(14) 


400 PSYCHOMETRIKA 


the constants being 
hy = (Kn — D* + 1Vz + n@ — 2) Σ VA2[Vz + n(n — 2) 3: Vil; 
h, = Vr(Vr — ὍΣ V)/2[Vz + n(n — 2) У V]. 

Similarly, adding item k to T will increase KR-20 if 

(15) Cir — HV,» Π., 


where 
Н, = т(У 2 pest > V )/2@° 25 YV, — Vn, 
H, = Vr(Vr — $,V2/2( 2 Vy == Vr). 


Applications of (11), (12), (14), and (15) 


Computations of the item selection conditions to increase either KR-21 
or KR-20 are not as laborious as they may first appear. By grouping the 
test distribution into five symmetrical categories as recommended by Flanagan 
[5], and obtaining item counts for the four extreme categories, calculation 
of the test variances and item-test covariances required in the conditions 
can be carried out rapidly. 

It is known that if the 7 distribution is grouped into five categories 
containing percentages of scores (from high to low), 9, 19, 44, 19 and 9, 
which are assigned new scores, 2, 1, 0, —1 and —2, respectively, then the 
grouping will not only have high efficiency, but will also incorporate an 
adjustment for the effect of the Coarse grouping on the estimate of Түт, 


the item-test point-biserial correlation [18]. The new covariances will be, for 
a sample of N subjects, 


(16) Cir = (2e + f — g — 2h)/N = Dir/N. 


In (16) e, f, g, and h are frequencies of a preferred response to item j for 
the extreme categories for which the Scores are 2, 1, —1, and —2, respectively. 
It ean then be shown that the covariances needed in (11), (12), (14), and (15) 
are, to a sufficiently close approximation, 


(17) Cir = Dis > Р,./№, 


where the summation is over the n weighted differences, corresponding to 
the n items, as defined in (16). Also an estimate of the variance required for 
computing the constants of the item selection conditions is 


(18) Vr = (2 Di) /N*. 


As an example, Table 1 presents data which sh 
conditions increased homogeneity for a sample of 
test used was the De scale [6] from the California 


ow how the item selection 
100 college women. The 
Psychological Inventory, 


a 


HAROLD WEBSTER 401 


TABLE 1 


Variations in Homogeneity Due to Selecting Items 
so as to Increase KR-21 and KR-20 


—— = T 


n KR-21 KR-20 x Vr Discard Add 
54 «562 «615 17.85 26.63 15 0 
39 .677 . 757 11.47 23.81 7 н 
33 . 708 „ 751 8. 33 19. 89 8 H 
26 .725 . 764 6.15 15.52 - - 
39 . 7 . 757 11.47 23.81 7 0 
32 .685 „770 9.38 19.71 5 4 
31 .693 780 8.46 18.66 - - 


a personality test known to discriminate delinquent from nondelinquent 
persons in numerous samples. In a number of previous samples, KR-21 for 
the complete scale of 54 items has been found to fall in the range .50 to .60. 

The first four rows of Table 1 show how KR-20 and KR-21 varied 
when items were alternately discarded and added back in iterations using 
conditions (11) and (12). The last three rows of Table 1 show variations in 
these same coefficients when (14) and (15) were applied starting with the 
39-item test of the second row. 


Discussion 

The method is not very time consuming, and with the help of (17) 
and (18) can easily be applied using item count data. Since there were only 
100 papers, the data of Table 1 required the time of one person for two 
days; however, this sample was used only for this example, and a much 
larger sample would ordinarily be preferable. It is likely that when there 
is only a single sample available, the greater the number of iterations employed, 
the larger N should be in order to allow for the inereased use of variations 
peculiar to the one sample. 

From Table 1 it can be seen that the ratio of variance to mean increases, 
for either method, with the number of iterations. This is an indication of 
the inereasing skewness already mentioned. If the scoring for every item 
were reversed, so that the test became one of nondelinquency, an examination 
of the selection conditions shows that the skewness would still inerease, but 
in the opposite direction. Also in (11) the greater the skewness, the more 


uw 


402 PSYCHOMETRIKA 


n, — 2T, in the constant k, , differs from zero, thus assigning increasing weight 
to the item means. 

In Table 1, KR-20 necessarily exceeds KR-21 even when it is KR-21 
that is increased; however, when conditions for increasing KR-20 are applied 
(last three rows of Table 1), KR-21 increases rather slowly so that the differ- 
ences between these two measures in the last two rows is larger than the 
difference usually found at this reliability level. Lord [14] shows that un- 
reliability due to variations of obtained means about the true mean is included 
in the estimate provided by KR-21, but is not for KR-20. In the iterative 
process of which Table 1 is an example, the true mean also must vary because 
the test length changes. This would seem to imply that neither reliability 
measure could be ideally coordinated with the underlying stochastic process. 
Because of the differences arising between the two coefficients in Table 1, 
however, it is likely that use of the conditions based on KR-21 will produce 
reliability which holds up better in subsequent samples. It is recommended, 

therefore, that (11) and (12), rather than the KR-20 conditions, be used, 
preferably with a new sample of subjects every time a new form of the test 
is scored. Even if a succession of samples is not available, one or two iterations 
using (11) and (12) with a large sample would seem preferable to other 


methods which have been considered for inereasing homogeneity. 


REFERENCES 


Ш Adkins, Dorothy C. A rational comparison of item-selection techniques. Psychol, 
Bull., 1938, 35, 655. (Abstract) 

[2] Bedell, B. J. Determination of the optimum number of ite: 

measuring a single ability. Psychometrika, 1950, 15, 419-430. 

[3] Cronbach, L. J. Coefficient alpha and the internal structure of {ο 

1951, 16, 297-334. 

[4] Davis, F. B. Item analysis in relation to educ. 

Psychol. Bull., 1952, 49, 97-119. 

[5] Flanagan, J. C. The effectiveness of short r 

cients. Psychol. Bull., 1952, 49, 342-348, 

[6] Gough, H. C. and Peterson, D. R. The identification and measurement of predispo- 
sitional factors in erime and delinqueney. J. consult. Psychol., 1952, 16, 207-212. 

[7] Gulliksen, H. The relation of item difficulty and inter-item correlation to test, variance 

and reliability. Psychometrika, 1945, 10, 79-91. . 

[8] Gulliksen, H. Theory of mental tests. New York: Wiley, 1950, 

[9] Horst, P. Correcting the Kuder-Richardson reliability for dispersion of item difficulties. 

Psychol. Bull., 1953, 50, 371-374. 

[10] Kuder, С. F. and Richardson, M. W. The theory of 

Psychometrika, 1937, 2, 151-160. 
[11] Loevinger, Jane. A systematic approach to the construction and evaluation of tests 
of ability. Psychol. Monogr., 1947, 61, No. 4 (Whole No. 285). 
[12] Loevinger, Jane, Gleser, Goldine C. and DuBois, Р.Н, Maximizing the discriminating 
power of a multiple-score test. Psychometrika, 1953, 18, 309-317. 


ms to retain in a test 
sts, Psychometrika, 
ational and psychological testing. 


methods for calculating correlation coeffi- 


the estimation of test reliability. 


[13] 
[14] 


[15] 
[16] 
[17] 
[18] 
[19] 
[20] 


HAROLD WEBSTER 403 


Lord, Е. M, Estimating test reliability. Educ. psychol. Measmt, 1955, 15, 325-336. 
Lord, F. M. Sampling fluctuations resulting from the sampling of test items. Psycho- 
metrika, 1955, 20, 1-22. 

Lord, F. M. Some perspectives on “The attenuation paradox in test theory." Psychol. 
Bull., 1955, 52, 505-510. 

Tucker, L. R. Maximum validity of a test with equivalent items. Psychometrika, 
1946, 11, 1-13. 

Webster, H. Maximizing test validity by item selection. Psychometrika, 1956, 21, 
153-164. 

Webster, H. Transformed statistics for use in test construction, Psychol. Bull., 1956, 
53, 488-102. 

Wherry, R. J. and Gaylord, R. H. The concept of test and item reliability in relation 
to factor pattern. Psychometrika, 1943, 8, 247-264. 

Wherry, R. J. and Winer, B. J. A method for factoring large numbers of items. 
Psychometrika, 1953, 18, 161-179. 


Manuscript received 4/13/56 


Revised manuscript received 10/29/56 


с2"9971$ 1661 ‘Og sump 'a2uwivg 00°00S¢ 
26° Sele 15-9561 's3uscosmqejq *ussV UDOT pue sZujAvS pooas[Sug 
*0l65$ 2567 “Oç eunp “рид aarzasay 
qm. 15-9567 “034тәгән LL* 15658 LS6T “Oç amp ‘aoveteg 
"198 9561 “62 əunp ‘aouuteg s 
LI дәритишән 
32NYIYE στ'στ LS-9S6T ‘squowas.mastq 
D ms 
ς6'ερἰςφ ezus MUFA TOL "i£" fll 26-9567 'σηάτοοομ 
SS°Seltt T20L 
g σοβ ση xuvd 00° 00$$ wjuiOJTlw) “səTə3uy so] 
'62 93274199 TTIW 1205 "USSV UBO pue sSujgAvg ποηττοάοσηομ 
"6L 2993004 puw K19u073938 00*005€ OpuiO[O) *pooao[Sug 
v τι (οποτ15οῃταπα) ποτ ντοάσοῦ οῃ11σκοϊοκοά *ussV ποστ pue οβπτλος poosəTŠug 
‘SSIS (9әпр Jo $06) vo71w10d102 оулуәшоцоЛад 9661 “65 ome “spum: aarasay 
= SS Sel 9S6T “62 әшпр «οοσυτυς 
SIGHISUNESIa 
š " SSANSSSH ANY SONVTIVE 
'egost S3dT19599 TOL 
2 ? στ'στςέφ єз0әшәзлтдвтӣа TVO, 
© 00" gnoatv[T929T74 
W 96'6 озцәшАейдәло 00°S2 σποοποττοοοτμ 
о Oz'1t 9uop3v2;[qnd ποταναοάτοῦ zor gang чуул pƏAT929H O9*ct . врипјән 
پڻ‎ 6) στ sqdəxƏ9ouoq рио sənssr πους Βατττο 
ζω ϱ0” 29018 8) εςς 9Ἀυησοᾷ puu Αποποτηυης 
Ё 09 915 00°т92 9213J0 Sgoujsng *S392TA298 τυττοηοτοος 
---- — 00" 809 997770 TvT2O31TP3 :сәзтлдәс TvT4v321228 
τ 666т ος" LET (16/02/9--9S/t/0t) s1o3mswoay зо ρασᾶτις 
στ L 9661 00" 99 (1S/02/9--9S/T/T) 2озуря зшозсуссу jo рчәїуз5 
q O$*296 (46/02/9--95/1/Ot) лозтра 3ujSwuwg Jo рпә4рзс 
^ 66" 1661 cd Өр š σηπτσάομ 
a _ ЛИНИН — σι T'ON “Z OT} “2 ‘ON “Τὸ әшптод 
SIƏQZƏH зчәрпз5 g1əqzəgn ἀνολ "uxri3smoqofeg Βυτττογ pue Fuyquyzg 
ond) SIJISOGSN 
( ) SINSKGSuüSSIG 
"HT LOTS S4dT222 ο 
1561 ‘Og amp рәрия μοι ἃ T01 
әд τνο01,4 10} Jus mqa PUP ο1άτοοομ JO yun NS e οηατσάομ 
HLSWOH2ASd ет'662 5ΠΠΟΟΟΥ SButAes uo γβοσογαΙ 
μμ” $0* 96$ (s3mnoostp SSƏT) g-G oudeigouoq χο ates 
oz°g65 (51πποοοτρ 559Τ) σοποοτ xovg Jo οτος 
09°SS9¢ (sanp Јо $06) 4321205 27139moqaKsq 
00*1595$ (sgumoostp Кәцәә SSƏT) сиот тловдпб 


SNOIIVOTIHO 5531 'SaAMSSSH CNY GONYTYG 


LL’ 1299$ 
00*0079$ TF30L 
00°05 SƏSTAIƏS Твүгозәлоәс̧ 
00° 051 (4S/Ae/zt--16/1/L) spuədrqs 
00* 0005$ ϑπτττυμ pue Jugu 


ez ‘SOR “ZZ “TOA 
беҳүлзәсоцоќва JO 31902 po3vzT453 


SNOILVOTIEO 


Sees‏ — > ص ص 
SPUNI ολαοδοἩ PUV әдиштъң “T01‏ . 


Ὀτασο]ττνῦ ‘вәтәЗиу вот 
*ussy ποστ pue s3uTAsS ποη1τοάοσηομ 
ορυαοτοῦ “poonəT3us 


SIdISOGH 
LS6t “Ос amp vopug 
TPX TUOSTA 495 03u929n1Q5;Q puo оаїрәоәу jo 400024945 


HOLDVYORIOO OTEISNOHDASa 


404 


INDEX FOR VOLUME 22 


AUTHOR 


Atkinson, Richard C. A stochastic model for rote serial learning. 87-96. 

Bass, Bernard M. Iterative inverse factor analysis—a rapid method for clustering persons, 
105-107. 

Birch, David. A model for response tendency combination. 373-380. 

Blumen, Isadore. On the ranking problem. 17-28. 

Bock, R. Darrell. Note on the least squares solution for the method of successive cate- 
gories. 231-240. 

Brogden, Hubert E. The expected variance of the sampling errors for a set of item-criterion 
correlations. 75-78. 

Brogden, Hubert E. New problems for old solutions. 301-310. 

Burke, C. J. (with W. K. Estes). A component model for stimulus variables in discrimina- 
tion learning. 133-146. 

Campbell, Donald T. (with J. W. Cotton and R. D. Malone). The relationship between 
factorial composition of test items and measures of test reliability. 347-358. 

Cartwright, Desmond S. A computational procedure for tau correlation. 97-104. | 

Cohen, Burton Н. (with J. M. Sakoda). Exact probabilities for contingency tables using 
binomial coefficients. 83-86. TU 

Cotton, John W. (with D. T. Campbell and R. D. Malone). The relationship between 
factorial composition of test items and measures of test reliability. 347-358. 

Cureton, Edward E. The upper and lower twenty-seven per cent rule. 293-296. 

Diederich, G. W. (with S. J. Messick and L. R Tucker). A general least squares solution 
for successive intervals. 159-174. 

Dwyer, Paul S. The detailed method of optimal regions. 43-52. f 

Ebel, Robert L. Review of “J. R. Gerberich, Specimen Objective Test Items—A Guide to 
Achievement T'est Construction." 297-298. МЕР 

Estes, W. К. Theory of learning with constant, variable, or contingent probabilities of 
reinforcement. 113-132. NUT анс” 

Estes, W. K. (with C. J. Burke). A component model for stimulus variables in discrimina- 
tion learning. 133-146. 

Fan, Chung-Teh. On the applications of the method of absolute scaling. 175-184. 

Fan, Chung-Teh (with Frances Swineford). A method of score conversion through item 
statistics. 185-188. 

Guttman, Louis. A necessary and sufficient formula for matrie factoring. 79-82. I 

Guttman, Louis. Simple proofs of relations between the communality problem and multiple 
correlation. 147-158. 

Harris, William P. A revised law of comparative judgment. 189-198. Р 

Helmstadter, Gerald C. Procedures for obtaining separate set and content components o 
& test score. 381-394. | . k 

Holland, J. G. (with W. B. Knowles and E. P. Newlin). A correlational analysis of track- 
ing behavior. 275-288. | et 

Horst, Paul (with Charlotte MacEwan). Optimal test length for multiple prediction: the 
general case. 311-324. 

Kemeny, John G. (with J. L. Snell). Markov processes in learning theory. 221-230. 

Keats, John A. Estimation of error variances of test scores. 29-42. 


405 


406 PSYCHOMETRIKA 


Kinder, Elaine F. (with J. Lev). New analysis of variance formul 
mutually paired subjects. 1-16. 

Knowles, W. B. (with J. G. Holland and E. P. Newlin). A correlational an 
behavior. 275-288. 

Leiman, John M. (with J. Schmid). The develo, 
53-62. 

Lev, Joseph (with Elaine F. Kinder). New а 
from mutually paired subjects. 1-16. 
Lord, Frederic M. A significance test for the hypothesis th 

same trait except for errors of measurement. 207-220, 

Lubin, Ardie (with H. G. Osburn). A theo 
quantitative criterion. 63-74. 

Lubin, Ardie (with H. G. Osburn). The us 
test scoring methods. 359-372. 

Lyerly, Samuel B. Review of “R. C. Bose, et al., T'ables of Partially Balanced Designs with 
Two Associate Classes” and “F, B. Binet, et al., Analysis of Confounded Factorial Ez- 
periments in Single Replications.” 300. 

MacEwan, Charlotte (with P. Horst 
general case. 311-324, 


Malone, R. Daniel (with J. W. Cotton and D. T. Campbell). The relationship between 
factorial composition of test items and measures of test reliability. 347-358, 

Messick, S. J. (with G. W. Diederich and L. R Tucker), 
for successive intervals. 159-174. - 

Newlin, E. P. (with W. B. Knowles and J. G. Holland). A correlational 
ing behavior. 275-288. ë 

Osburn, H. G. (with A. Lubin). A theory of pattern analysis for the prediction of a quanti- 
tative criterion. 63-74. 

Osburn, H. G. (with A. Lubin). The use οἱ 
scoring methods. 359-372. 

Rodgers, David A. A fast approximate 
agreement between loadings and pred 

Sakoda, James M. (with B. H. Cohen). 10) 
binomial coefficients. 83-86. 

Schmid, John (with J. M. Leiman). The development of hie 

Shepard, Roger N. Stimulus and response generalization: a stochastic model relating 
generalization to distance in psychological space, 325-346 

Simon, Herbert A. Amounts of fixation and discovery in maze learning behavior, 261-268 

Snell, J. Laurie (with J. G. Kemeny). Markov processes in learning theory. 221-230 i 

Solley, Charles M. Review of “S, Siegel, Nonparametric Statistics for the Behaviorial Sci- 
ences." 298-299. 


Swineford, Frances (with C. Т, Fan). A method of score conversion through item statistics 
185-188. ў 


as for treating data from 
alysis of traeking 
pment of hierarchical factor solutions. 
nalysis of variance formulas for treating data 
at two variables measure the 
ry of pattern analysis for the prediction of a 


е of configural analysis for the evaluation of 


). Optimal test length for multiple prediction: the 


A general least Squares solution 


analysis of track- 


f configural analysis for the evaluation of test 


ables using 


rarchical factor solutions, 53-62, 


Tryon Robert C. Communality of a variable: formulation by cluster analysis, 241-260. 
Tucker, L. R (with G. W. Diederich and 8. J. Messick), A general least squares solution 
for successive intervals. 159-174. 
Webster, Harold. Item selection methods for increasing test homogeneity, 395-403. 
Wilcox, Richard H. A measure of coherence for human information filters. 269-274, 
ambling response-set, in objective tests. 289-299. 


: 
3 


E UL 
1 TM, 


ir 


vs 


M. 


ç 
E 
' ΚΝ А 
E i 4 
> 2 
d ` ` 
EU] 
E 
E » 
xt HC 
- y 
Tw 3 
TI ARP 
E - 
- 1 
Ax 


ή, 


Sees 


